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Introduction 


1.1 


About this book 


This book is intended to help scientists and engineers learn version 3 of the Python 
programming language and its associated NumPy, SciPy and Matplotlib libraries. No 
prior programming experience or scientific knowledge in any particular field is assumed. 
However, familiarity with some mathematical concepts such as trigonometry, complex 
numbers and basic calculus is helpful to follow the examples and exercises. 

Python is a powerful language with many advanced features and libraries; while the 
basic syntax of the language is straightforward to learn, it would be impossible to teach 
it in depth in a book of this size. Therefore, we aim for a balanced, broad introduction to 
the Central features of the language and its important libraries. The text is interspersed 
with examples relevant to scientific research, and at the end of most sections there are 
questions (short problems designed to test knowledge) and exercises (longer problems 
that usually require a short computer program to solve). Although it is not necessary 
to complete ali of the exercises, readers will find it useful to attempt at least some of 
them. Where a section, example or exercise contains more advanced material that may 
be skipped on first reading, this is indicated with the Symbol 0- 

In Chapter 2 of this book, the basic syntax, data structures and flow control of a 
Python program are introduced. Chapter 3 is a short interlude on the use of the Pylab 
library for making graphical plots of data: this is useful to visualize the output of 
programs in subsequent chapters. Chapter 4 provides more advanced coverage of the 
core Python language and a brief introduction to object-oriented programming. There 
follows another short chapter introducing the popular IPython and IPython Notebook 
environments, before chapters on scientific programming with NumPy, Matplotlib and 
SciPy. The final chapter covers more general topics in scientific programming, including 
floating point arithmetic, algorithm stability and programming style. 

Readers who are already familiar with the Python programming language may wish 
to skim Chapters 2 and 4. 

Code examples and exercise Solutions may be downloaded from the book’s website at 
scipython.com . Note that while comments have been included in these downloadable 


rrograms, they are not so extensive in the printed version of this book: instead, the code 
is explained in the text itself through numbered annotations (such as O). Readers typing 
in these programs for themselves may wish to add their own explanatory comments to 
the code. 


1 
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1.2 About Python 

Python is a powerful, general-purpose programming language devised by Guido van 
Rossum in 1989. 1 It is classified as a high-level programming language in that it auto- 
matically handles the most fundamental operations (such as memory management) 
carried out at the processor level (“machine code”). It is considered a higher-level 
language than, for example, C, because of its expressive syntax (which is close to 
natural language in some cases) and rich variety of native data structures such as lists, 
tuples, sets and dictionaries. For example, consider the following Python program which 
outputs a list of names on separate lines. 

Listing 1.1 Outputing a list of names using a program written in Python 

# egl-names.py: output three names to the console. 

names = ['Isaac Newton', 'Marie Curie', 'Albert Einstein'] 
for name in names: 
print (name) 


Output: 

Isaac Newton 
Marie Curie 
Albert Einstein 

Now compare this with the equivalent program in C. 
Listing 1.2 Outputing a list of names using a program written in C 

/* egi-names.c: output three names to the console. */ 
#include <stdio.h> 

#include <stdlib.h> 

#include <string.h> 

#define MAX_STRING_LENGTH 20 
#define NUMBER_OF_STRINGS 3 

int main() 

{ 

int i; 

char names[NUMBER_OF_STRINGS][MAX_STRING_LENGTH+1]; 
strcpy(names[0], "Isaac Newton"); 
strcpy(names[1], "Marie Curie"); 
strcpy(names[2], "Albert Einstein"); 

for (i=0;i<NUMBER_OF_STRINGS;i++) { 

fprintf(stdout, "%s\n" , names[i]); 

} 

return EXIT_SUCCESS; 

} 


1 Python’s “benevolent dictator for life.” 
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Even if you are not familiar with the C language, you can see there is quite a lot of 
overhead involved in coding even this simple task in C: three includes of libraries not 
loaded by default, explicit declarations of variables to hold the list (“array”, in C) of 
names, names, and a counter, i, and explicit indexing of this array in a for loop; you 
even need to add the line endings (‘\n’ is the “new line” character). This source code 
then has to be compiled - converted into the machine code that the computer processor 
understands - before it can be run ( executed ). Furthermore, there is plenty of scope for 
errors (bugs): trying to print the name stored in name [10] will likely cause junk to be 
output: the C compiler won’t stop you from accessing this nonexistent name. 

The same program written in three lines of Python is clean and expressive: we do 
not have to explicitly declare that names is a list of strings, there is no need for a 
loop counter like i and there are no separate libraries to include (import in Python). 
To run the Python program, one simply needs to type python egi-names .py which 
will automatically invoke the Python “interpreter” to compile and then run the resulting 
“byte-code” (a kind of intermediate representation of the program between its source 
and the ultimate machine code that Python dispatches to the processor). 

Python’s syntax aims to ensure that “There should be one - and preferably only one - 
obvious way to do it.” This differs from some other popular high-level languages such as 
Ruby and Perl, which take the opposite approach, encapsulated by the mantra “there’s 
more than one way to do it.” For example, there are (at least) four obvious ways to 
output the same list in Perl: 2 

Listing 1.3 Different ways to output a list of names using a program written in Perl 

@names = ("Isaac Newton", "Marie Curie", "Albert Einstein"); 

# Method 1 

print "$_\n" for @names; 

# Method 2 

print join "\n", @names; 
print "\n"; 

# Method 3 

print map { "$_\n" } @names; 

# Method 4 
$" = "\n"; 
print "@names\n"; 


(Note also Perl’s famously concise but somewhat opaque syntax.) 


1.2.1 Advantages and disadvantages of Python 

Here are some of the main advantages of the Python programming language and why 
you might want to use it: 


2 Well, obvious to Perl programmers. 
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• Its clean and simple syntax makes writing Python programs fast and generally 
minimizes opportunities for bugs to creep in. When done right, the resuit is high- 
quality Software that is easy to maintain and extend. 

• It s free - Python and its associated libraries are free of cost and open source, 
unlike commercial offerings such as Mathematica. 

• Cross-platform support: Python is available for every commonly available 
computer system, including Windows, Unix, Linux and Mac OS X. Although 
platform-specific extensions exist, it is possible to write code that will run on any 
platform without modification. 

• Python has a large library of modules and packages that extend its functionality. 
Many of these are available as part of the “Standard library” provided with the 
Python interpreter itself. Others, including the NumPy, SciPy and Matplotlib 
libraries used in scientific computing, can be downloaded separately for no cost. 

• Python is relatively easy to learn. The syntax and idioms used for basic operations 
are applied consistently in more advanced usage of the language. Error messages 
are generally meaningful assessments of what went wrong rather than the generic 
“crashes” that can occur in compiled lower-level languages such as C. 

• Python is flexible: it is often described as a “multi-paradigm” language that 
contains the best features from the procedural, object-oriented and functional 
programming paradigms. There is little need for the work-arounds required in 
some languages when a problem can only be solved cleanly with one of these 
approaches. 

So where’s the catch? Well, Python does have some disadvantages and isn’t suitable 

for every application. 

• The speed of execution of a Python program is not as fast as some other, fully 
compiled languages such as C and Fortran. For heavily numerical work, the 
NumPy and SciPy libraries alleviate this to some extent by using compiled- 
C code “under the hood,” but at the expense of some reduced flexibility. For 
many, many applications, however, the speed difference is not noticeable and the 
reduced speed of execution more than offset by a much faster speed of develop- 
ment. That is, it takes much less time to write and debug a Python program than 
to do the same in C, C++ or Java. 

• It is hard to hide or obfuscate the source code of a Python program to prevent 
others from copying or modifying it. However, this doesn’t mean that successful 
commercial Python programs don’t exist. 

• A common complaint about Python has historically been that its rapid devel- 
opment has led to compatibility issues between versions. Certainly there are 
important differences between Python 2 and Python 3 (described in the next 
section), but the complaint stems from the fact that within the Python 2 series, 
there were major improvements and additions to the language that meant that 
code written in a later version (say, 2.7) would not run on an earlier version of 
Python (e.g., 2.6), although code written for an earlier version of Python will 
always run on a later version (within the same branch, 2 or 3). If you use the 
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latest version of Python (see Section 1.3) you probably won’t run into a problem, 
but some operating Systems that come with Python are rather conservative and 
install by default only an older version. 


1.2.2 Python 2 or Python 3? 

At the time of writing, Python users have a choice to make: whether to use the older, 
more established Python 2 version of the language or the newer Python 3. Although the 
differences between the two versions may seem minor, code written in Python 3 will 
not run under Python 2 and vice versa: Python 3 is not backward-compatible with its 
predecessor. This book teaches Python 3. 

The latest major version of Python 2, Python 2.7, will be the last of that branch. Since 
its release in 2009, the number of users and extent of library support for Python 3 has 
grown to the point that new users would lind little benefit in learning Python 2 except 
to maintain legacy code. 

There are several reasons for major change between versions (breaking your users’ 
existing code is not something to be undertaken lightly): Python 3 fixes some ugly 
quirks and inconsistencies in the language and provides Unicode support for ali strings 
(eliminating a lot of the confusion that is created in dealing with Unicode and non- 
Unicode strings in Python 2). Unicode is an international Standard for the representation 
of text in most of the writing Systems in the world. 

It is anticipated that most users of this book will not have trouble converting their own 
code between the two versions of Python if necessary. Where important, the differences 
are pointed out in the text. 


1.3 Installing Python 

The official website of Python is www.python.org/, and contains full and easy-to-follow 
instructions for downloading Python. However, there are several full distributions which 
include the NumPy, SciPy and Matplotlib libraries (the “SciPy stack”) to save you from 
having to download and install these yourself: 

• Anaconda is available for free (including for commerical use) from http:// 
continuum.io/downloads. It installs both Python 2 and Python 3, but the default 
version can be selected either before downloading as indicated on this web page, 
or subsequently using the ‘conda’ command. 

• Enthought Canopy is a similar distribution with a free version and various tiers 
of paid-for versions including technical support and development Software. 

In most cases, one of these distributions should be ali you need. We provide some 
platform-specific notes below. 

The source code (and binaries for some platforms) for the NumPy, SciPy, Matplotlib 
and IPython packages are available separately at: 
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• NumPy: http://sourceforge.net/projects/numpy/ 

• SciPy: http://sourceforge.net/projects/scipy/ 

• Matplotlib: http://matplotlib.org/downloads.html 

• IPython: https://github.com/ipython/ipython/releases 

Windows 

Windows users have a couple of further options for installing the full SciPy stack: 
Python(x,y) (https://code.google.eom/p/pythonxy/) and WinPython (http://winpython. 
sourceforge.net/). Both are free. 

Mac OS X 

Mac OS X, being based on Unix, comes with Python, but it is usually an older version 
of Python 2. You must not delete or modify this installation (it’s needed by the operating 
system), but you can follow the instructions above for obtaining Python 3 and the SciPy 
stack. OS X does not have a native package manager (an application for managing 
and installing Software), but the two popular third-party package managers, Homebrew 
(http://brew.sh/) and MacPorts (www.macports.org/), can both supply Python 3 and its 
packages if you prefer this option. 

Linux 

Almost all Linux distributions come with Python 2, but usually not Python 3, so you 
will need to install it from the links above: the Anaconda and Canopy distributions 
both have versions for Linux. Most Linux distributions come with their own Software 
package managers (e.g., apt in Debian and rpm for RedHat). These can be used to 
install Python 3 and its libraries, though finding the necessary package repositories may 
take some research on the Internet. Be careful not to replace or modify your system 
installation as other applications may depend on it. 


1.4 The command line 

Most of the code examples in this book are written as standalone programs which can 
be run from the command line (or from within an integrated development environment 
(IDE) if you use one: see Section 9.3.2). To access the command line interface (also 
known as a console or terminal) on different platforms, follow the instructions below. 

• Windows 7 and earlier: Start > All Programs > Command Prompf, alternatively, 
type cmd in the Start > Run input box. 

• Windows 8: Preview (lower left of screen) > Windows System: All apps\ alterna¬ 
tively type ‘cmd’ in the search box pulled down the top-right corner of the screen. 

• Mac OS X: Finder > Applications > Utilities > Terminal 

• Linux: if you are not using a graphical interface you are already at the command 
line; if you are, then locate the Terminal application (distributions vary, but it is 
usually found within a System Utilities or System Tools subfolder). 
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Commands typed at the command line are interpreted by an application called a shell, 
which allows the user to navigate the file system and is able to start other applications. 
For example, the command 
python myprog.py 

instructs the shell to invoke the Python interpreter, sending it the file myprog. py as 
the script to execute. Output from the program is then returned to the shell and displayed 
in your console. 
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2.1 The Python Shell 

This chapter introduces the syntax, structure and data types of the Python program- 
ming language. The first few sections do not involve writing nruch beyond a few 
statements of Python code and so can be followed using the Python shell. This 
is an interactive environment: the user enters Python statements that are executed 
immediately after the Enter key is pressed. 

The steps for accessing the “native” Python shell differ by operating system. To start 
it from the command line, first open a terminal using the instructions from Section 1.4 
and type python. 

To exit the Python shell, type exi t (). 

When you start the Python shell, you will be greeted by a nressage (which will vary 
depending on your operating system and precise Python version). On my system, the 
message reads: 

Python 3.3.5 |Anaconda 2.0.1 (x86\_64)| (default, Mar 10 2014, 11:22:25) 

[GCC 4.0.1 (Apple Inc. build 5493)] on darwin 

Type "help", "Copyright", "credits" or "license" for more information. 

>>> 

The three chevrons (>>>) are the prompt, which is where you will enter your Python 
commands. Note that this book is concerned with Python 3, so you should check that 
the Python version number reported on the first line is Python 3 . x. Y where the precise 
values of the minor version numbers x and Y should not be important. 

Many Python distributions come with a slightly more advanced shell called IDLE, 
which features tab-completion, and syntax highlighting (Python keywords are colored 
specially when you type them). We will pass over the use of this application in favor of 
the newer and more advanced IPython environment, discussed in Chapter 5. 

It is also possible for many installations (especially on Windows) to start a Python 
shell directly from an application installed when you install the Python interpreter itself. 
Some installations even add a shortcut icon to your Desktop which will open a Python 
shell when you click on it. 
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2.2 Numbers, variables, comparisons and logic 

2.2.1 Types of numbers 

Among the most basic Python objects are the numbers, which come in three types : 
integers (type: int), floating point numbers (type: f loat) and complex numbers (type: 
complex). 

Integers 

Integers are whole numbers such as 1, 8, —72 and 3847298893721407. In Python 3, 
there is no limit to their magnitude (apart from the availability of your computer’s 
memory). 1 Integer arithmetic is exact. 

Floating point numbers 

Floating point numbers are the representation of real numbers such as 1.2, —0.36 and 
1.67263 x 1CT 7 . They do not, in general, have the exact value of the real number 
they represent, but are stored in binary to a certain precision (on most Systems, to the 
equivalent of 15-16 decimal places), 2 * * as explained in Section 9.1. For example, the 
number | is stored as the binary equivalent of 1.33333333333333325931846502- ■ ■, 
which is nearly (but not quite) the same as the infinitely repeating decimal representation 
of Sj = 1.3333 • • •. Moreover, even numbers that do have an exact decimal representa¬ 
tion may not have an exact binary representation: for example 1/10 is represented by 
the binary number equivalent to 0.10000000000000000555111512---. Because of this 
linite precision, floating point arithmetic is not exact but, with care, it is “good enough” 
for most scientific applications. 

Any single number containing a period (‘. ’) is considered by Python to specify a 
floating point number. Scientific notation is supported using ‘e’ or ‘e’ to separate the 
significand (mantissa) from the exponent: for example, l.67263e-7 represents the 
number 1.67263 x 10 -7 . 

Complex numbers 

Complex numbers such as 4 + 3 j consist of a real and an imaginary part (denoted by j in 
Python), each of which is itself represented as a floating point number (even if specified 
without a period). Complex number arithmetic is therefore not exact but subject to the 
same finite precision considerations as f loats. 

A complex number may be specified either by “adding” a real number to an imaginary 
one (denoted by the j suffix), as in 2.3 + l. 2 j or by separating the real and imaginary 
parts in a call to complex, as in complex (2.3, 1.2). 


1 In Python 2, there were two kinds of integer: “simple” integers (system-dependent, but usually stored in 
either 32 or 64 bits and “long” integers (of any size), indicated with the suffix L. 

2 This corresponds to the implementation of the IEEE 754 double-precision Standard. 
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Example E2.1 Typing a number at the Python shell prompt simply echoes the number 
back to you: 

>>> 5 
5 

>>> 5. 

5.0 

>>> 0.10 
0.1 

>>> 0.0001 
0.0001 

>>> 0.0000999 
9.99e-05 

Note that the Python interpreter displays numbers in a Standard way. For example: 

O The intemal representation of 0.1 discussed earlier is rounded to ‘o . l\ which is the 
shortest number with this representation. 

© Numbers smaller in magnitude than 0.0001 are displayed in scientific notation. 

A number of one type can be created from a number of another type with the relevant 
constructor : 

>>> float(5) 

5.0 

>>> int(5.2) 

5 

>>> int(5.9) 

5 

>>> complex(3.) 

(3+0j) 

>>> complex(0., 3.) 

3 j 

O Note that a floating point number is rounded down in casting it into an integer. 

© Constructing a complex object from a f loat generates a complex number with the 
imaginary part equal to zero. 

© To generate a pure imaginary number, you have to explicitly pass two numbers to 
complex with the first, real part, equal to zero. 


2.2.2 Using the Python shell as a calculator 

Basic arithmetic 

With the three basic number types described earlier, it is possible to use the Python shell 
as a simple calculator using the operators given in Table 2.1. These are binary operators 
in that they act on two numbers (the operands) to produce a third (e.g., 2**3 evaluates 
to 8). 

Python 3 has two types of division: floating point division (/) always retums a floating 
point (or complex) number resuit, even if it acts on integers. Integer division (/ /) always 
rounds down the resuit to the nearest integer; the type of the resulting number is an int 


O 

0 

© 


O 


© 
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Table 2.1 Basic Python 
arithmetic operators 

+ addition 

subtraction 

* multiplication 

/ floating point division 

/ / integer division 

% modulus (remainder) 

* * exponentiation 


only if both of its operands are ints; otherwise it returns a f loat. Some examples 
should make this clearer: 

Regular floating point division with (/): 

>>> 2.7 / 2 
1.35 

>>> 9/2 

4.5 

>>> 8/4 
2.0 

The last operation returns a f loat even though both operands are ints. 3 
Integer division with (//): 

>>> 8 // 4 
2 

>>> 9 // 2 

4 

>>> 2.7 // 2 
1.0 

Note that // can perform integer arithmetic (rounding down) on floating point numbers. 
The modulus operator gives the remainder of an integer division: 

>>> 9 % 2 
1 

>>> 4.5 % 3 

1.5 

Again, the number retumed is an int only if both of the operands are ints. 

Operator precedence 

Arithmetic operations can be strung together in a sequence, which naturally raises the 
question of precedence : for example, does 2 + 4*3 evaluate to 14 (as 2 + 12) or 
18 (as 6 * 3)? Table 2.2 shows that the answer is 14: multiplication has a higher 
precedence than addition and is evaluated first. These precedence rules are overridden 
by the use of parentheses: for example, (2 + 4) * 3 = 18. 


3 This is a major difference from Python 2, in which the / operator performed integer division on two integers. 
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Table 2.2 Python arithmetic operator 
precedence 


* * (highest precedence) 

*,/,//,% 

+, - (lowest precedence) 


Operators of equal precedence are evaluated left to right with the exception of expo- 
nentiation (**), which is evaluated right to left (that is, “top down” when written using 
the conventional superscript notation). For example, 

>>> 6/2/4 # the same as 3 / 4 

0.75 

>>> 6 / (2 / 4) # the same as 6 / 0.5 


12 .0 

>>> 2**2**3 # the same as 2** (2**3) == 2**8 

256 

>>> (2**2)**3 # the same as 4**3 


64 


In examples such as these, the text following the hash Symbol, #, is a comment that is 
ignored by the interpreter. We shall sometimes use comments in this to explain more 
about a statement, but it is not necessary to type it in if you try out the code. 


Methods and attributes of numbers 

Python numbers are objects (in fact, everything in Python is an object) and have cer- 
tain attributes, accessed using the “dot” notation: <object>. <attribute> (this use 
of the period has nothing to do with the decimal point appearing in a floating point 
number). Some attributes are simple values: for example, complex number objects have 
the attributes real and imag which are the real and imaginary (floating point) parts of 
the number: 

>>> (4+5j).real 
4.0 

>>> (4+5j).imag 
5.0 

Other attributes are methods : callable functions that act on their object in some way. 4 
For example, complex numbers have a method, conj ugate, which returns the complex 
conjugate: 

>>> (4+5j).conjugate() 

(4-5j) 

Here, the empty parentheses indicate that the method is to be called, that is, the function 
to calculate the complex conjugate is to be run on the number 4 + 5 j; if we omit them, 
as in (4+5 j ) . conjugate, we are referring to the method itself (without calling it) - 
this method is itself an object! 


4 In this book, we will use the terms method and function interchangeably. In Python, everything is an object 
and the distinction is not as meaningful as it is in some other languages. 
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Integers and floating point numbers don’t actually have very many attributes that it 
makes sense to use in this way, but if you’re curious you can find out how many bits an 
integer takes up in memory by calling its bit length method. For example, 

>>> (3847298893721407).bit_length() 

52 

Note that Python allocates as much memory as is necessary to exactly represent the 
integer. 

Mathematical functions 

Two of the mathematical functions that are provided “by default” as so-called built-ins 
are abs and round. 

abs retums the absolute value of a nunrber as follows: 

>>> abs(-5.2) 

5.2 

>>> abs(-2) 

2 

>>> abs(3+4j) 

5.0 

This is an example of polymorphism: the same function, abs, does different things to 
different objects. If passed a real nunrber, x, it returns |x|, the non-negative magnitude of 
that number, without regard to sign; if passed a complex number, z = x + iy, it returns 
the modulus, |z| = jx 2 + y 2 . 

The round function (with one argument) rounds a floating point number to the 
nearest integer: 

> > > round(-9.62) 

-10 

> > > round(7.5) 

8 

> > > round(4.5) 

4 

Note that in Python 3, this function employs Banker’s rounding: if a number is mid way 
between two integers, then the even integer is retumed. 5 

Python is a very modular language: functionality is available in packages and mod¬ 
ules that are imported if they are needed but are not loaded by default: this keeps the 
memory required to run a Python program to a minimum and improves performance. 
For example, many useful mathematical functions are provided by the math module, 
which is imported with the statement 

>>> import math 

The math module concerns itself with floating point and integer operations (for 
functions of complex numbers, there is another module, called cmath). These are called 


5 


In Python 2 the round () function rounds away from zero when two integers are equally close: thus 
round (2.5) is 3 but round (-2.5) is - 3 . 
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Table 2.3 Some functions provided by the math module. Angular 
arguments are assumed to be in radians. 


math.sqrt(x) 

y/x 

math.exp(x) 

e x 

math.log (x) 

1 n a' 

math.log(x, b) 

log b x 

math.loglO(x) 

logio r 

math.sin(x) 

sin(jc) 

math.cos(x) 

cos (jc) 

math.tan(x) 

tan (x) 

math.asin(x) 

arcsin(x) 

math.acos(x) 

arccos(x) 

math.atan(x) 

arctant» 

math.sinh(x) 

sinh(x) 

math.cosh(x) 

cosh(x) 

math.tanh(x) 

tanhU) 

math.asinh(x) 

arsinh(x) 

math.acosh(x) 

arcosh(x) 

math.atanh(x) 

artanh(x) 

math.hypot(x, y) 

The Euclidean norm, y/x 2 + y 2 

math.factorial(x) 

x! 

math.erf(x) 

The error function at x 

math.gamma(x) 

The gamma function at x, V (x) 

math.degrees(x) 

Converts x from radians to degrees 

math.radians(x) 

Converts x from degrees to radians 


by passing one (or sometimes more than one) number to them inside parentheses (the 
numbers are said to act as arguments to the function being called). For example, 

>>> import math 
>>> math.exp(-1.5) 

0.22313016014842982 
>>> math.cos(0) 

1.0 

>>> math.sqrt(16) 

4.0 

A complete list of the mathematical functions provided by the math module is 
available in the online documentation; 6 the more commonly used ones are listed in 
Table 2.3. 

The math module also provides two very useful nonfunction attributes: math. pi and 
math. e give the values of n and e, the base of the natural logarithm, respectively. 

It is possible to import the math module with ‘from math import *’ and access 
its functions directly: 

>>> from math import * 

>>> cos(pi) 

-1.0 


6 http://docs.python.org/3/library/math.html. 
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However, although this may be convenient for interacting with the Python shell, it is 
not recommended in Python programs. There is a danger of name conflicts (particularly 
if many modules are imported in this way), and makes it difficult to know which function 
comes from which module. Importing with import math keeps the functions bound to 
their module’s namespace : thus, even though math. cos requires more typing it makes 
for code that is much easier to understand and maintain. 


Example E2.2 As might be expected, mathematical functions can be strung together in 
a single expression: 

>>> import math 

>>> math.sin(math.pi/2) 

1.0 

>>> math.degrees (math. acos (math. sqrt (3) /2) ) 

30.000000000000004 

Note the linite precision here: the exact answer is arccos(V3/2) = 30°. 

The fact that the int function rounds down in casting a floating point number to an 
integer can be used to find the number of digits a positive integer has: 

>>> int(math.loglO(9999) ) + 1 

4 

>>> int(math.loglO(10000)) + 1 

5 


2.2.3 Variables 

What is a variable? 

When an object, such as a f loat, is created in a Python program or using the Python 
shell, memory is allocated for it: the location of this memory within the computer’s 
architecture is called its address. The actual value of an object’s address isn’t actually 
very useful in Python, but if you’re curious you can find it out by calling the id built-in 
method: 

>>> id(20.1) 

4297273888 # for example 

This number refers to a specific location in memory that has been allocated to hold the 
f loat object with the value 2 0.1. 

For anything beyond the most basic usage, it is necessary to store the objects that are 
involved in a calculation or algorithm and to be able to refer to them by some convenient 
and meaningful name (rather than an address in memory). This is what variables are 
for. 7 A variable name can be assigned (“bound”) to any object and used to identify that 
object in future calculations. For example, 


7 In Python, it is arguably better to talk of object identifiers or identifier names rather than variables, but we 
will not be too striet about this. 
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>>> a = 3 
>>> b = -0.5 
>>> a * b 
-1.5 

In this snippet, we create the int object with the value 3 and assign the variable name 
a to it. We then create the f loat object with the value -0.5 and assign b to it. Finally, 
the calculation a * b is carried out: the values of a and b are multiplied together and 
the resuit retumed. This resuit isn’t assigned to any variable, so after being output to the 
screen it is thrown away. That is, the memory required to store the resuit, a f loat with 
the value -1.5, is allocated for long enough for it to be displayed to the user, but then 
it is gone. * 8 If we need the resuit for some subsequent calculation, we should assign it to 
another variable: 

>>> c = a * b 

>>> c 

-1.5 

Note that we did not have to declare the variables before we assign them (teli Python 
that the variable name a is to refer to an integer, b is to refer to a floating point number, 
etc.), as is necessary in some computer languages. Python is a dynamically typed lan¬ 
guage and the necessary object type is inferred from its definition: in the absence of a 
decimal point, the number 3 is assumed to be an int; -0.5 looks like a floating point 
number and so Python defines b to be a f loat. 9 

Variable names 

There are some rules about what makes a valid variable name: 

• Variable names are case-sensitive : a and A are different variables; 

• Variable names can contain any letter, the underscore character ) and any digit 

(0-9)... 

• ... but must not start with a digit; 

• A variable name must not be the same as one of the reserved keywords given in 
Table 2.4; 

• The built-in constant names True, False and None cannot be assigned as vari¬ 
able names. 

Most of the reserved keywords are pretty unlikely choices for variable names, with 
the exception of lambda. Python programmers often use lam if they need to use it. 
A good text editor will highlight the keywords as you type your program, so this is 
unlikely to cause confusion. 

It is possible to give a variable the same name as a built-in function (e.g., abs and 
round), but that built-in function will no longer be available after such an assignment, 


° Actually in an interactive Python session the resuit of the last calculation is stored in the special variable 

called _ (the underscore), so it isn’t really thrown away until overwritten by the next calculation. 

9 This is sometimes called duck-typing after the phrase attributed to James Whitcomb Riley: “When I see a 
bird that walks like a duck and swirns like a duck and quacks like a duck, I call that bird a duck.” 
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Table 2.4 Python 3 reserved keywords 


and 

assert 

break 

class 

continue 

def 

dei 

elif 

else 

except 

finally 

for 

from 

global 

if 

import 

in 

is 

lambda 

nonlocal 

not 

or 

pass 

print 

raise 

return 

try 

while 

yield 



Note: in Python 2, exec is a keyword but nonlocal is not. 


so this is probably best avoided - luckily, most have names that are unlikely to be chosen 
in practice. 10 

In addition to the rules mentioned earlier, there are certain style considerations that 
dictate good practice in naming variables: 

• Variable names should be meaningful (area is better than a) ... 

• ... but not too long (the_area_of_the__triangle is unwieldy); 

• Generally, don’t use i (uppercase i), 1 (lowercase l) or the uppercase letter o: 
they look too much like the digits 1 and 0; 

• The variable names i, j and k are usually used as integer counters; 

• Use lowercase names, with words separated by underscores rather than ‘Camel- 
Case’: for example, mean height and not MeanHeight. * 11 

These and many other rules and conventions are codified in a style guide called PEP8 
which forms part of the Python documentation 12 (see also Section 9.3.1). 

Breaking these style rules will not resuit in your program failing to run, but it might 
make it harder to maintain and debug - the person you help might be yourself! 


Example E2.3 Heron’s formula gives the area, A, of a triangle with sides a, b, c as: 
A = yjs(s — a)(s — b)(s — c ) where 5 = \{a + b + c). 

For example, 

>>> a = 4.503 
>>> b = 2.377 
>>> c = 3.902 
>>> s = (a + b + c) / 2 

O >>> area = math.sqrt(s * (s - a) * (s - b) * (s - c)) 

>>> area 

4.63511081571606 

O Don’t forget to import math if you haven’t already in this Python session. 


10 For a complete list of built-in function names, see http://docs.python.Org/3/library/functions.html. 

11 CamelCase in Python is usually reserved for class names: see Section 4.6.2. 

12 http://legacy.python.org/dev/peps/pep-0008/. 
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Table 2.5 Python comparison operators 


< = 


< 


> 


>= 


equal to 
not equal to 
greater than 
less than 

greater than or equal to 
less than or equal to 


Example E2.4 The data type and memory address of the object referred to by a variable 
name can be found with the built-ins type and id: 


>>> type(a) 
cclass 'float' > 

>>> id(area) 

4298539728 # for example 


2.2.4 Comparisons and logic 

Operators 

The main comparison operators that are used in Python to compare objects (such as 
numbers) are given in Table 2.5. 


The resuit of a comparison is a boolean object (of type bool) which has exactly 
one of two values: True or False. These are built-in constant keywords and cannot be 
reassigned to other values. 13 For example, 

>>> 7 == 8 
False 

>>> 4 >= 3.14 
True 

Python is able, as far as possible without ambiguity, to compare objects of different 
types: the integer 4 is promoted to a f loat for comparison with the nunrber 3.14. 

Note the importance of the difference between == and =. The single equals sign is an 
assignment, which does not return a value: the statement a=7 assigns the variable a to 
the integer object 7 and that is ali, whereas the expression a==7 is a test: it returns True 
or False depending on the value of a. 14 

Care should be taken in comparing floating point numbers for equality. Because they 
are not stored exactly and calculations involving them frequently leads to a loss of 
precision, this can give unexpected results to the unwary. For example, 

>>> a = 0.01 
>>> b = 0. 1**2 
>>> a == b 
False 


13 In Python 2, however, unhelpful assignments such as True = False were allowed. 

14 In some languages, such as C, assignment returns the value of whatever is being assigned, which can lead 
to some nasty and hard-to-find bugs when = is mistakenly used as a comparison operator. 
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In this example, 0.01 cannot be represented exactly as a floating point number but is 
(on my system) stored as a binary number equivalent to 0.010000000000000000208; 
the resuit of squaring the floating point representation of 0.1 on the other hand is 
0.01000000000000000194, and these two numbers are not the same. See Section 9.1 
for more information. 

Logic operators 

Comparisons can be modified and strung together with the logic operator keywords and, 
not and or. See Tables 2.6, 2.7 and 2.8. For example, 

>>> 7.0 >4 and -1 >= 0 # equivalent to True and False 

False 

>>> 5<4orl!=2 # equivalent to False or True 

True 

In compound expressions such as these, the comparison operators are evaluated first, 
and then the logic operators in order of precedence: not, and, or. This precedence is 
overridden with parentheses, as for arithmetic. Thus, 

>>> not 7.5 < 0.9 or 4 == 4 
True 

>>> not (7.5 < 0.9 or 4 == 4) 

False 


Table 2.6 Truth table for 
the not operator 


p 

not P 

True 

False 

False 

True 


Table 2.7 

Truth table for the and operator 

p 

Q 

P and Q 

True 

True 

True 

False 

True 

False 

True 

False 

False 

False 

False 

False 

Table 2.8 

Truth table for the or operator 

p 

Q 

P or Q 

True 

True 

True 

False 

True 

True 

True 

False 

True 

False 

False 

False 
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The truth tables for the logic operators are given below; note that, in common with most 
languages or in Python is the inclusive or variant for which A or B is True if both A 
and B are True, rather than the exclusive or operator (A xor B is True only if one but 
not both of A and B are True). 


0 Boolean equivalents and conditional assignment 

In a logic test expression, it is not always necessary to make an explicit comparison to 
obtain a boolean value: Python will try to convert an object to a bool type if needed. 
For numeric objects, 0 evaluates to False and any nonzero value is True: 

>>> a = 0 

>>> a or 4 < 3 # same as: False or 4 < 3 

False 
>>> 

>>> not a+1 # same as: not True 

False 

In this last example, addition has higher precedence than the logic operator not, so 
a+l is evaluated first to give l. This corresponds to boolean True, and so the whole 
expression is equivalent to not True. To explicitly convert an object to a boolean 
object, use the bool constructor: 

>>> bool(-1) 

True 

>>> bool(0.0) 

False 

In fact, the and and or operators always return one of their operands and not just its 
bool equivalent. So, for example: 

>>> a = 0 
O >>> a-2 or a 
-2 

© >>> 4 > 3 and a-2 

-2 

© >>> 4 > 3 and a 

0 

Logic expressions are evaluated left to right, and those involving and or or are short- 
circuited : the second expression is only evaluated if necessary to decide the truth value 
of the whole expression. The three examples presented here can be analyzed as follows: 
O In the first example, a—2 is evaluated first: this is equal to —2, which is equivalent 
to True, so the or condition is fulfilled and the operand evaluating to True is returned 
immediately: —2. 

© 4 > 3 is True, so the second expression must be evaluated to establish the truth of 
the and condition. a—2 is equal to —2, which is also equivalent to True, so the and 
condition is fulfilled and —2 (as the most recently evaluated expression) is returned. 

© In the last case, a is 0 which is equivalent to False: the and condition evaluates to 
False because of this, and so the return value is 0. 


Downloaded from http:/www.cambridge.org/core. University of Illinois at Urbana - Champaign Library, on 28 Dec 2016 at 09:02:36, subject to the Cambridge Core 
terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.Org/1 0.1017/CB09781139871754.002 


2.2 Numbers, variables, comparisons and logic 


21 


Python’s special value, None 

Python detines a single value, None, of the special type, NoneType. It is used to repre- 
sent the absence of a delined value, for example, where no value is possible or relevant. 
This is particularly helpful in avoiding arbitrary default values (such as o, l or — 9 9) for 
bad or missing data. 

In a boolean comparison, None evaluates to False, but to test whether or not a 
variable, x, is equal to None, use 

if x is None 

and 

if x is not None 

rather than the shortcuts i f xandif not x. 15 


Example E2.5 A common Python idiom is to assign a variable using the return value 
of a logic expression: 

>>> a = 0 
>>> b = a or -1 
>>> b 
-1 

That is (for a understood to be an integer): “set b equal to the value of a unless a==o, 
in which case set b equal to — 1 


2.2.5 Immutability and identity 

The objects presented so far, such as integers and booleans, are immutable. Immutable 
objects do not change after they are created, though a variable name may be reassigned 
to refer to a different object from the one it was originally assigned to. For example, 
consider the assignments: 

>>> a = 8 
>>> b = a 

The lirst line creates the integer object with value 8 in memory, and assigns the name 
a to it. The second line assigns the name b to the same object. You can see this by 
inspecting the address of the object refered to by each name: 

>>> id(a) 

4297273504 
>>> id(b) 

4297273504 

Thus, a and b are references to the same integer object. Now suppose a is reassigned to 
a new number object: 


15 Recall that not x also evaluates to True if x is any of o, False or the empty string and so is not a very 
reliable way to test specifically if x is not set to None. 
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>>> a = 3.14 
>>> a 
3.14 
>>> b 
8 

>>> id(a) 

4298630152 
>>> id(b) 

4297273504 

Note that the value of b has not changed: this variable stili refers to the original 8. 
The variable a now refers to a new, f loat object with the value 3.14 located at a 
new address. This is what is meant by immutability: it is not the “variable” that cannot 
change but the immutable object itself. This is illustrated in Figure 2.1. 

A more convenient way to establish if two variables refer to the same object is to use 
the is operator, which determines object identity: 

>>> a = 2432 
>>> b = a 
>>> a is b 
True 

>>> c = 2432 
>>> c is a 
False 

>>> c == a 
True 

Here, the assignment c = 24 3 2 creates an entirely new integer object so c is a 
evaluates as False, even though a and c have the same value. That is, the two variables 
refer to different objects with the same value. 

It is often necessary to change the value of a variable in some way, such as 

>>> a = 800 
>>> a = a + 1 
>>> a 
801 

The integers 800 and 801 are immutable: the line a = a + l creates a new integer 
object with the value 8 01 (the right-hand side is evaluated first) and assigns it to the 



Figure 2.1 (a) Two variables referring to the same integer; (b) after reassigning the value of a. 
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variable name a (the old 800 is forgotten 16 unies s some other variable refers to it). That 
is, a points to a different address before and after this statement. 

This reassignment of a variable by an arithmetic operation on its value is so com- 
mon that there is a useful shorthand notation: the augmented assignment a += 5 is 
the same as a = a + 5. The operators -=, *=, /=, //=, %= work in the same way. 
C-style increment and decrement operations such as a++ for a += l are not supported 
in Python, however. 17 


Example E2.6 Python provides the operator is not: it is more natural to use c is 
not a than not c is a. 

>>> a = 8 
>>> b = a 
>>> b is a 
True 

>>> b /= 2 

>>> b is not a 
True 


0 Example E2.7 Given the previous discussion, it might come as a surprise to find that 

>>> a = 256 
>>> b = 256 
>>> a is b 
True 

This happens because Python keeps a cache of commonly used, small integer objects 
(on my system, the numbers -5-256). To improve performance, the assignment a = 
2 5 6 attaches the variable name a to the existing integer object without having to allocate 
new memory for it. Because the same thing happens with b, the two variables in this 
case do, in fact, point to the same object. By contrast, 

>>> a = 257 
>>> b = 257 
>>> a is b 
False 


2.2.6 Exercises 
Questions 

Q2.2.1 Predict the resuit of the following expressions and check them using the Python 
shell. 

a. 2.7/2 

b. 2/4-1 


16 That is, the memory assigned for it by Python is reclaimed (“garbage-collected”) for general use. 

17 Assignment and augmented assignment in Python are statements not expressions and so do not retum a 
value and cannot be chained together. 
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c. 

2 

//4-1 

d. 

(2 

+ 5) % 3 

e. 

2 

+ 5 % 3 

f. 

3 

* 4 // 6 

g- 

3 

* (4 // 6) 

h. 

3 

* 2 * * 2 

i. 

3 

* * 2 * 2 


Q2.2.2 The operators listed in Table 2.1 are all binary operators: they take two 
operands (numbers) and return a single value. The Symbol - is also used as a unary 
operator, which returns the negative value of the single operand on which it acts. For 
example, 

>>> a = 4 
>>> b = -a 
>>> b 
-4 

Note that the expression b = - a (which sets the variable b to the negative value of a) is 
different from the expression b - = a (which subtracts a from b and Stores the resuit in 
b). The unary - operator has a higher precedence than *, / and % but a lower precedence 
than exponentiation (**), so that, for example-2 ** 4 is -16 (i.e., — (2 4 ), not (—2) 4 ). 
Predict the resuit of the following expressions and check them using the Python shell. 

di. — 2**2 

b. 2 ** -2 

C. — 2 * * — 2 

d. 2 * * 2 * * 3 

e. 2 * * 3 * * 2 

f. —2**3**2 

g. (—2) ** 3 ** 2 

h. (—2) ** 2 ** 3 

Q2.2.3 Predict and explain the results of the following statements. 

a. 9 + 6j / 2 

b. complex(4, 5) . conjugate (). imag 

C. complex (0, 3 j ) 

d. round (2.5) 

e. round (-2.5) 

f. abs (complex (5 , -4)) == math. hypot (4,5) 

Q2.2.4 Determine the value of i 1 as a real number, where i = 1 ■ 

Q2.2.5 Explain the (surprising?) behavior of the following short code: 

>>> d = 8 
>>> e = 2 

>>> from math import * 
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>>> sqrt(d ** e) 
16.88210319127114 


Q2.2.6 Formally, the integer division, a / / b is defined as th efloor of a/b (sometimes 
written |_f J) - that is, the largest integer less than or equal to a / b. The modulus or 
remainder, a % b (written a mod b), is then 


a mod b = a — b 



Use these definitions to predict the resuit of the following expressions and check them 
using the Python shell. 


a. 7 // 4 

b. 7 % 4 

C. -7 // 4 

d. -7 % 4 

e. 7 // -4 

f. 7 % -4 

g. -7 // -4 

h. -7 % -4 


Q2.2.7 If two adjacent sides of a regular, six-sided die have the values a and b when 
viewed side-on and read left to right, the value on the top of the die is given by 3 (a 3 b — 
ab 3 ) mod 7. 

Determine the value on the top of the die if (a) a = 2, b = 6, (b) a = 3, b = 5. 

Q2.2.8 How many times must a sheet of paper (thickness, t = 0.1 mm but otherwise 
any size required) be folded to reach the moon (distance from Earth, d = 384,400 km)? 

Q2.2.9 Predict the results of the following expressions and check them using the 
Python shell. 

a. not 1 < 2 or 4 > 2 

b. not (1 < 2 or 4 > 2) 

C. 1 < 2 or 4 > 2 

d. 4 > 2 or 10/0 == 0 

e. not 0 < 1 

f. 1 and 2 

g. 0 and 1 

h. 1 or 0 

i. type (complex (2, 3).real) is int 

Q2.2.10 Explain why the following expression does not evaluate to 100. 

>>> 10^2 
8 

Hint: refer to the Python documentation for bitwise operators. 


Downloaded from http:/www.cambridge.org/core. University of Illinois at Urbana - Champaign Library, on 28 Dec 2016 at 09:02:36, subject to the Cambridge Core 
terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.Org/1 0.1017/CB09781139871754.002 




26 


The core Python language I 


Problems 


P2.2.1 There is no exclusive-or operator provided “out of the box” by Python, but one 
can be constructed from the existing operators. Devise two different ways of doing this. 
The truth table for the xor operator is given in Table 2.9. 

P2.2.2 Some fun with the math module: 

a. What is special about the numbers sin and (jt + 20) ! ? 

b. What happens if you try to evaluate an expression, such as e 1000 , which generates 
a number larger than the largest floating point number that can be represented 
in the default double precision? What if you restrict your calculation to integer 
arithmetic (e.g., by evaluating 1000!)? 

c. What happens if you try to perform an undefined mathematical operation such as 
division by zero? 

d. The maximum representable floating point number in IEEE 754 double precision 
is about 1.8 x IO 308 . Calculate the length of the hypotenuse of a right angled 
triangle with opposite and adjacent sides 1.5 x IO 200 and 3.5 x IO 201 (i) using the 
math. hypot () function directly and (ii) without using this function. 

P2.2.3 Some languages provide a sign (a) function which returns -1 if its argument, 
a, is negative and 1 otherwise. Python does not provide such a function, but the math 
module does include a function math. copysign (x, y), which returns the absolute 
value of x with the sign of y. How would you use this function in the same way as the 
missing sign (a) function? 


P2.2.4 The World Geodetic System is a set of international standards for describing 
the shape of the Earth. In the latest WGS-84 revision, the Earth’s geoid is approximated 
to a reference ellipsoid that takes the form of an oblate spheroid with semi-major and 
semi-minor axes a = 6378137.0 m and c = 6356752.314245 m respectively. 

Use the formula for the surface area of an oblate spheroid, 


Sobi = 2 ner \ 1 + 


1 


-atanh(c) 


where e 2 



to calculate the surface area of this reference ellipsoid and compare it with the surface 
area of the Earth assumed to be a sphere with radius 6371 km. 


Table 2.9 Truth table for the xor operator 


p 

Q 

P xor Q 

True 

True 

False 

False 

True 

True 

True 

False 

True 

False 

False 

False 
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2.3 Python objects I: strings 

2.3.1 Defining a string object 

A Python string object (of type str) is an ordered, immutable sequence of characters. 
To detine a variable containing some constant text (a string literal), enclose the text in 
either single or double quotes: 

>>> greeting = "Helio, Sir!" 

>>> bye = 'A bientot' 

Strings can be concatenated using either the + operator or by placing thern next to each 
other on the same line: 

>>> 'abc' + 'def' 

'abcdef' 

>>> 'one ' 'two' ' three' 

'one two three' 

Python doesn’t place any restriction on the length of a line, so a string literal can 
be defined in a single, quoted block of text. However, for ease of reading, it is 
usually a good idea to keep the lines of your program to a fixed maximum length 
(79 characters is recommended). To break up a string over two or more lines of 
code, use the line continuation character, ‘V or (better) enclose the string literal in 
parentheses: 

>>> long_string = 'We hold these truths to be self-evident,'\ 

... ' that ali men are created equal ...' 


>>> long_string = ('We hold these truths to be self-evident,' 

... ' that all men are created equal...') 

This delines the variable long_string to hold a single line of text (with no carriage 
returns). The concatenation does not insert spaces so they need to be included explicitly 
if they are wanted. The spaces lining up the opening quotes in this example are optional 
but make the code easier to read. 

If your string consists of a repetition of one or more characters, the * operator can be 
used to concatenate them the required number of times: 

>>> ' a' *4 
'aaaa' 

>>> '-o-'*5 
' -o — o — o — o — o- ' 

The empty string is defined simply as s = " (two single quotes) or s = "" . 
Finally, the built-in function, str converts an object passed as its argument into a 
string according to a set of rules defined by the object itself: 

>>> str(42) 

'42' 

>>> str(3.4e5) 

'340000.0' 

>>> str(3.4e20) 

'3.4e+20' 
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For finer control over the formatting of the string representation of numbers, see Section 

2.3.7. 


Example E2.8 Strings concatenated with the V operator can repeated with V, but 
only if enclosed in parentheses: 

>>> ('a'*4 + 'B')*3 
'aaaaBaaaaBaaaaB' 


2.3.2 Escape sequences 

The choice of quotes for strings allows one to include the quote character itself inside a 
string literal - just define it using the other quote: 

>>> verse = 'Quoth the Raven "Nevermore. 

But what if you need to include both quotes in a string? Or to include more than one 
line in the string? This case is handled by special escape sequences indicated by a 
backslash, \. The most commonly used escape sequences are listed in Table 2.10. For 
example, 

>>> sentence = "He said, \"This parrot's dead.\"" 

O >>> sentence 

'He said, "This parrot\'s dead."' 

© »> print(sentence) 

He said, "This parrot's dead." 

>>> subjects = 'Physics\nChemistry\nGeology\nBiology' 

>>> subjects 

'Physic s \nChemi s t ry \nGeo 1ogy\nBio1ogy' 

>>> print(subjects) 

Physics 

Chemistry 

Geology 

Biology 

O Note that just typing a variable’s name at the Python shell prompt simply echoes its 
literal value back to you (in quotes). 


Table 2.10 Common Python escape sequences 


Escape sequence 

Meaning 

V 

Single quote (') 

\" 

Double quote (") 

\n 

Linefeed (LF) 

\r 

Carriage return (CR) 

\t 

Horizontal tab 

\b 

Backspace 

\\ 

The backslash character itself 

\u, \U, \N { } 

Unicode character (see Section 2.3.3) 

\x 

Hex-encoded byte 


Downloaded from http:/www.cambridge.org/core. University of Illinois at Urbana - Champaign Library, on 28 Dec 2016 at 09:02:36, subject to the Cambridge Core 
terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.Org/1 0.1017/CB09781139871754.002 







2.3 Python objects I: strings 


29 


© To produce the desired string including the proper interpretation of special charac- 
ters, pass the variable to the print built-in function (see Section 2.3.6). 

On the other hand, if you want to detine a string to include character sequences such 
as ‘\n’ without them being escaped, deline a raw string prelixed with r: 

>>> rawstring = r'The escape sequence for a new line is \n.' 

>>> rawstring 

'The escape sequence for a new line is \\n.' 

>>> print(rawstring) 

The escape sequence for a new line is \n. 

When delining a block of text including several line endings it is often inconvenient to 
use \n repeatedly. This can be avoided by using triple-quoted strings: new lines defined 
within strings delimited by """ and ' ' ' are preserved in the string: 18 

a = """one 
two 

three""" 

>>> print(a) 

one 

two 

three 

This is often used to create “docstrings” which document blocks of code in a program 
(see Section 2.7.1). 


Example E2.9 The \x escape denotes a character encoded by the single-byte hex value 
given by the subsequent two characters. For example, the capital letter ‘n’ has the value 
78, which is 4e in hex. Hence, 

>>> '\x4e' 

'N' 

The backspace “character” is encoded as hex 0 8, which is why ' \b' is equivalent to 
'\x08': 

>>> 'hello\b\b\b\b\bgoodbye' 

'hello\x08\x08\x08\x08\x08goodbye' 

Sending this string to the print ( ) function outputs the string formed by the sequence 
of characters in this string literal: 

>>> print ('hello\b\b\b\b\bgoodbye') 
goodbye 


2.3.3 Unicode 

Python 3 strings are composed of Unicode characters. Unicode is a Standard describing 
the representation of more than 100,000 characters in just about every human language, 
as well as many other specialist characters such as scientilic symbols. It does this by 


18 It is generally considered better to use three double quotes, 11 "", for this purpose. 
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assigning a number (code point ) to every character; the numbers that make up a string 
are then encoded as a sequence of bytes. 19 For a long time, there was no agreed encoding 
Standard, but the UTF-8 encoding, which is used by Python 3 by default, has emerged 
as the most widely used today. 20 If your editor will not allow you to enter a character 
directly into a string literal, you can use its 16- or 32-bit hex value or its Unicode 
character name as an escape sequence: 

>>> 7 \u00E9 7 # 16-bit hex value 
'e' 

>>> '\u000000E9' # 32-bit hex value 
7 e' 

>>> '\N{LATIN SMALL LETTER e WITH ACUTE}' # by name 
' e' 


Example E2.10 Providing your editor or terminal allows it, and you can type them at 
your keyboard or paste them from elsewhere (e.g, a web browser or word processor), 
Unicode characters can be entered directly into string literals: 

>>> creams = 'Creme fraiche, ereme brulee, ereme patissiere' 

Python even supports Unicode variable names, so identiliers can use non-ASCII 
characters: 

>>> E = 4 

>>> ereme = 'anglaise' 

Needless to say, because of the potential difliculty in entering non-ASCII characters 
from a Standard keyboard and because many distinet characters look very similar, this 
is not a good idea. 


2.3.4 Indexing and slicing strings 

Indexing (or “subscripting”) a string returns a single character at a given location. Like 
ali sequences in Python, strings are indexed with the first character having the index 0; 
this means that the linal character in a string consisting of n characters is indexed at 
n — 1. For example, 

>>> a = "Knight" 

>>> a [0] 

7 K 7 

>>> a[3] 

' 9 ' 

The character is retumed in a str object of length 1. A non-negative index counts 
forward from the start of the string; there is a handy notation for the index of a string 
counting backward: a negative index, starting at -1 (for the linal character) is used. So, 

>>> a = "Knight" 

>>> a[-l] 

't' 


19 For a list of code points, see the official Unicode website’s code charts at www.unicode.org/charts/. 

20 UTF-8 encoded Unicode encompasses the venerable 8-bit encoding of the ASCII character set 
(e.g., A=65). 
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>>> a[-4] 

' i' 

It is an error to attempt to index a string outside its length (here, with index greater than 
5 or less than —6): Python raises an indexError: 

>>> a [6] 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

IndexError: string index out of range 

Slicing a string, s [i: j] , produces a substring of a string between the characters 
at two indexes, including the first (i) but excluding the second (j). If the first index 
is omitted, 0 is assumed; if the second is omitted, the string is sliced to its end. For 
example, 

>>> a = "Knight" 

>>> a [1: 3] 

' ni' 

>>> a[:3] 

' Kni' 

>>> a [3 : ] 

' ght' 

>>> a[:] 

'Knight' 

This can seem confusing at first, but it ensures that the length of a substring retumed 
as s [i : j ] has length j-i (for positive i, j) and that s [: i] + s [i : ] == s. Unlike 
indexing, slicing a string outside its bounds does not raise an error: 

>>> a= "Knight" 

>>> a [3:10] 

'ght' 

>>> a[10:] 


To test if a string contains a given substring, use the in operator: 

>>> 'Kni' in 'Knight': 

True 

>>> 'kni' in 'Knight': 

False 


Example E2.11 Because of the nature of slicing, s [m: n ], n-m is always the length of 
the substring. In other words, to return r characters starting at index m, use s [m:m+r]. 
For example, 

>>> s = 'whitechocolatespaceegg' 

>>> s [: 5] 

'white' 

>>> s [5 :14] 

'chocolate' 

>>> s [ 14:19] 

'space' 

>>> s[19:] 

' egg' 
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Example E2.12 The optional, third number in a slice specifies the stride. If omitted, 
the default is 1: retum every character in the requested range. To return every klh letter, 
set the stride to k. Negative values of k reverse the string. For example, 

>>> s = 7 King Arthur 7 
>>> s [: :2] 

7 Kn rhr' 

>>> S [1: :2] 

7 igAtu 7 

>>> S [-1:4:-1] 

7 ruhtrA 7 

This last slice can be explained as a selection of characters from the last (index -1) 
down to (but not including) character at index 4, with stride -l (select every character, 
in the reverse direction). 

A convenient way of reversing a string is to slice between default limits (by omitting 
the first and last indexes) with a stride of -1: 

>>> S [ : :-1] 

7 ruhtrA gniK 7 


2.3.5 String methods 

Python strings are immutable objects, and so it is not possible to change a string by 
assignment - for example, the following is an error: 

>>> a = 'Knight 7 
>>> a[0] = 7 k 7 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

TypeError: 7 str 7 object does not support item assignment 

New strings can be constructed from existing strings, but only as new objects. For 
example, 

>>> a += 7 Templar 7 
>>> print(a) 

Knight Templar 

>>> b = 'Black 7 + a[:6] 

>>> print(b) 

Black Knight 

To find the number of characters a string contains, use the len built-in method: 

>>> a = 'Earth 7 
>>> len(a) 

5 


String objects come with a large number of methods for manipulating and transform- 
ing them. These are accessed using the usual dot notation we have met already - some 
of the more useful ones are listed in Table 2.11. In this and similar tables, text in italics 
is intended to be replaced by a specific value appropriate to the use of the method; italic 
text in [square brackets ] denotes an optional argument. 
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Table 2.11 Some common string methods 


Method 


Description 


center( width) 

endswith (suffix) 
startswith( prefix) 
index( substring) 
lstrip([ chars]) 


rstrip( [chars] ) 


strip( [chars] ) 


upper() 
lower() 
title() 

replace(old, new) 
split([sep]) 


join([list]) 
isalpha() 

isdigit() 


Return the string centered in a string with total number of 
characters width. 

Return True if the string ends with the substring suffix. 
Return True if the string starts with the substring prefix. 
Retum the lowest index in the string containing substring. 
Return a copy of the string with any of the leading characters 
specihed by [chars] removed. If [chars] is omitted, any 
leading whitespace is removed. 

Return a copy of the string with any of the trailing characters 
specihed by [chars] removed. If [chars] is omitted, any 
trailing whitespace is removed. 

Return a copy of the string with leading and trailing characters 
specihed by [chars] removed. If [chars] is omitted, any 
leading and trailing whitespace is removed. 

Return a copy of the string with all characters in uppercase. 
Return a copy of the string with all characters in lowercase. 
Retum a copy of the string with all words starting with capitals 
and other characters in lowercase. 

Retum a copy of the string with each substring old replaced 
with new. 

Retum a list (see Section 2.4.1) of substrings from the 
original string which are separated by the string sep. If sep 
is not specihed, the separator is taken to be any amount of 
whitespace. 

Use the string as a separator in joining a list of strings. 
Retum True if all characters in the string are alphabetic and 
the string is not empty; otherwise return False. 

Retum True if all characters in the string are digits and the 
string is not empty; otherwise return False. 


Because these methods each retum a new string (remember that strings are immutable 
objects), they can be chained together: 

>>> s = '-+-Python Wrangling for Beginners' 

>>> s.lower().replace('wrangling 7 , 'programming').lstrip) 

'python programming for beginners' 


Example E2.13 Here are some possible manipulations using string methods: 

>>> a = 'java python c++ fortran' 

>>> a.isalpha() 

O False 

>>> b = a.titleO 
>>> b 

'Java Python C++ Fortran' 

>>> c = b.replace(' ', '!\n') 

>>> c 
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'Java! \nPython !\nC++! \nFortran ' 

>>> print(c) 

Java! 

Python! 

C++! 

Fortran! 

>>> c.index('Python') 

© 6 

>>> c [6:] .startswith('Py') 

True 

>>> c [6 :12] .isalphaO 
True 

O a. isalpha () is False because of the spaces and ' . 

© Note that \n is a single character. 


2.3.6 The print function 

One of the most obvious changes between Python 2 and Python 3 is in the way that 
print works. In the older version of Python, print was a statement that output the 
string representation of a list of objects, separated by spaces: 

>>> ans = 6 

>>> print 'Solve:', 2, 'x =', ans, 'for x' # Python 2 only! 

Solve: 2 x = 6 for x 

(There was also a special syntax for redirecting the output to a file.) Python 3 adopts a 
more consistent and flexible approach: print is a buiXi-in function (just like the others 
we have met such as len and round.) It takes a list of objects, and also, optionally, 
arguments end and sep, that specify which characters should end the string and which 
characters should be used to separate the printed objects respectively. Omitting these 
additional arguments gives the same resuit as the old print statement: the object fields 
are separated by a single space and the line is ended with a newline character. 21 For 
example, 

>>> ans = 6 

>>> print('Solve:', 2, 'x =', ans, 'for x') 

Solve: 2 x = 6 for x 

>>> print('Solve: ', 2, 'x = ', ans, ' for x', sep='', end='!\n') 

Solve: 2x = 6 for x! 

O >>> print() 

>>> print('Answer: x =', ans/2) 

Answer: x = 3.0 

O Note that print () with no arguments just prints the default newline end 
character. 

To suppress the newline at the end of a printed string, specify end to be the empty 
string: end=' ': 


- 1 The specific newline character used depends on the operating system: for example, on a Mac it is ‘\n’, on 
Windows it is two characters: ‘\r\n’. 
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>>> print ('A line with no newline character', end='') 

A line with no newline character>>> 

The chevrons, >> >, at the end of this line form the prompt for the next Python command 
to be entered. 


Example E2.14 print can be used to create simple text tables: 


>>> 

>>> 

>>> 


heading = '| Index of Dutch Tulip Prices |' 
line = '+' + '-'*16 + 13 + '+' 


print(line, heading, line, 
'| Nov 23 1636 | 

'| Nov 25 1636 | 

'| Feb 1 1637 | 


100 | ', 

673 |', 

1366 |', line, sep='\n') 


| Index of ; 

Dutch Tulip 

Prices | 




+ 

Nov 

23 

1636 | 

100 1 

Nov 

25 

1636 | 

673 | 

| Feb 

1 

1637 | 

1366 | 


2.3.7 String formatting 

Introduction to Python 3 string formatting 

In its simplest form, it is possible to use a string’s format method to insert objects into 
it. The most basic syntax is 

>>> '{} plus {} equals {}'. format (2, 3, 'five') 

2 plus 3 equals five 

Here, the format method is called on the string literal with the arguments 2, 3 and 
' five' which are interpolated, in order, into the locations of the replacement fields, 
indicated by braces, {}. Replacement fields can also be numbered or named, which 
helps with longer strings and allows the same value to be interpolated more than once: 22 

>>> '{1} plus {o} equals {2}'.format(2, 3, 'five') 

'3 plus 2 equals five' 

>>> '{numl} plus {num2} equals {answer}'.format(numl=2, num2=3, answer='five') 

'2 plus 3 equals five' 

>>> '{0} plus {o} equals {1}'.format(2, 2+2) 

'2 plus 2 equals 4' 

Note that numbered fields can appear in any order and are indexed starting at 0. 

Replacement fields can be given a minimum size within the string by the inclusion of 
an integer length after a colon as follows: 

>>> '=== {0:12} ==='.format('Python') 

' ===== Python ===' 


22 This type of string formatting was introduced into Python 2 as well, although only Python 2.7 supports 
unnamed replacement fields denoted by empty braces, {}. 


Downloaded from http:/www.cambridge.org/core. University of Illinois at Urbana - Champaign Library, on 28 Dec 2016 at 09:02:36, subject to the Cambridge Core 
terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.Org/1 0.1017/CB09781139871754.002 








36 


The core Python language I 


If the string is too long for the minimum size, it will take up as many characters as 
needed (overriding the replacement field size specified): 

>>> 'A number: <{0:2}>'.format(-20) 

'A number: <-20>' # -20 won't fit into 2 characters: 3 are used anyway 

By default, the interpolated string is aligned to the left; this can be modified to align to 
the right or to center the string. The single characters <, > and ' control the alignment: 

>>> 7 === {0:<12} === 7 .format( 7 Python 7 ) 

7 === Python ===' 

>>> 7 === {0:>12} === 7 .format( 7 Python 7 ) 

7 === Python ===' 

>>> 7 === {0:^12} === 7 .format( 7 Python 7 ) 

7 === Python ===' 

In these examples, the field is padded with spaces, but this fili character can also be 
specified. For example, to pad with hyphens in the last example, specify 

>>> 7 = = = { 0 : -'“'12 } = = = 7 . format ( 7 Python 7 ) 

7 === Python === 7 

It is even possible to pass the minimum field size as a parameter to be interpolated. Just 
replace the field size with a reference in braces as follows: 

>>> a = 15 

>>> 7 This field has {0} characters: ==={l:>{ 2 }}===. 7 .format(a, 7 the field 7 , a) 

7 This field has 15 characters: === the field===. 7 

Or with named interpolation: 

>>> 7 This field has {w} characters: ==={0:>{w}}===. 7 .format( 7 the field 7 , w=a) 

7 This field has 15 characters: === the field===. 7 

In each case, the second format specifier here has been taken to be : >15. 

To insert the brace characters themselves into a formatted string, they must be dou- 
bled up: use ‘ {{’ and ‘. 

Formatting numbers 

The Python 3 string format method provides a powerful way to format numbers. 

The specifiers ‘d’, ‘b’, ‘o’, ‘x’/‘x’ indicate a decimal, binary, octal and lowercase/ 
uppercase hex integer respectively: 

>>> a = 254 


>>> ‘ 

’ a 

= {0:5d} 7 .format(a) 

# 

decimal 

7 a = 


254 7 



>>> ‘ 

’ a 

= {0:10b} 7 .format(a) 

# binary 

7 a = 


11111110 7 



>>> ‘ 

’ a 

= {0:5o} 7 .format(a) 

# 

octal 

7 a = 


364 7 



>>> ‘ 

’ a 

= {0:5x} 7 .format(a) 

# 

hex (lowercase) 

7 a = 


fe 7 



>>> ‘ 

’ a 

= {0:5X} 7 .format(a) 

# 

hex (uppercase) 

7 a = 


FE 7 




Numbers can be padded with zeros to fili out the specified field size by prefixing the 
minimum width with a o: 
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>>> a = 254 

>>> 'a = {a:05d}format(a=a) 

'a = 00254' 

By default, the sign of a number is only output if it is negative. This behavior can also 
be customized by specifying, before the minimum width: 

• ‘+’: always output a sign; 

• ‘ : only output a negative sign, the default; or 

• ‘ : output a leading space only if the number is positive. 

This last option enables columns of positive and negative numbers to be lined up nicely: 

>>> print('{0: 5d}\n{l: 5d}\n{2: 5d}'.format(-4510, 1001, -3026)) 

-4510 

1001 

-3026 

>>> a = -25 
>>> b = 12 

>>> s = '{0:+5d}\n{1:+5d}\n= {2:+3d}'.format(a, b, a+b) 

>>> print(s) 

-25 
+ 12 
= -13 

There are also format specifiers for floating point numbers, which can be output 
to a chosen precision if desired. The most useful options are ‘f’: fixed-point nota- 
tion, ‘e7‘E’: exponent (i.e., “scientific” notation), and ‘g’/‘G’: a general format which 
uses scientific notation for very large and very small numbers. 23 The desired precision 
(number of decimal places) is specified as ‘ .p’ after the minimum field width. Some 
examples: 

>>> a = 1.464e-10 
>>> '{0:g}format(a) 

'1.464e-10' 

>>> '{0:10.2E}'.format(a) 

' 1.46E-10' 

>>> '{0 :15.13f}' .format(a) 

'0.0000000001464' 

>>> '{0:lOf}'.format(a) 

O ' 0.000000' 

O Note that Python will not protect you from this kind of rounding to zero if not enough 
space is provided for a fixed-point number. 

Older C-style formatting 

Python 3 also supports the less powerful, C-style format specifiers that are stili in 
widespread use. In this formulation the replacement fields are specified with the mini¬ 
mum width and precision specifiers following a % sign. The objects whose values are to 
be interpolated are then given after the end of the string, following another % sign. They 


23 More specifically, the g/G specifier acts like f/F for numbers between 10 4 and 10P wherep is the desired 
precision (which defaults to 6), and acts like e/E otherwise. 
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must be enclosed in parentheses if there is more than one of them. The same letters for 
the different output types are used as earlier; strings must be specified explicitly with 
‘%s’. For example, 

>>> kB = 1.3806504-23 

>>> 'Here\'s a number: %10.2e' % kB 

"Here's a number: 1.38e-23" 

>>> 'The same number formatted differently: %7.1e and %12.6e' % (kB, kB) 

'The same number formatted differently: 1.4e-23 and 1.380650e-23' 

>>> '%s is %g J/K' % ("Boltzmann's constant", kB) 

"Boltzmann's constant is 1.38065e-23 J/K" 


Example E2.15 Python can produce string representations of numbers for which thou- 
sands are separated by commas: 

>>> '{:11,d}format(1000000) 

' 1 , 000 , 000 ' 

>>> '{:11,.lf}format(1000000.) 

'1,000,000.0' 

Here is another table, produced using several different string methods: 

title = 'I' + '{: A 51 }'.format('Cereal Yields (kg/ha)') + 'I' 
line = '+' + '-'*15 + ' + ' + ('-'*8 + '+')*4 
row = '| { : <13 } |' + ' {:6,d} | ' *4 

header = '| {: A 13s} |'.format('Country') + (' {:^6d} | ' *4) .format(1980, 1990, 

2000, 2010) 

print('+' + '-'*(len(title)-2) + '+', 
title, 
line, 
header, 
line, 

row.format('China', 2937, 4321, 4752, 5527), 

row.format('Germany', 4225, 5411, 6453, 6718), 

row.format('United States', 3772, 4755, 5854, 6988), 

line, 

sep='\n') 


Cereal Yields (kg/ha) 


Country 

1980 

1990 

2000 

2010 | 

+ - + - 

- + - 

- + - 

- + - 


- + 

| China | 

2,937 1 

4,321 | 

4,752 | 

5, 

, 527 | 

| Germany 

4,225 | 

5,411 | 

6,453 | 

6, 

, 718 | 

| United States | 

3,772 | 

4,755 | 

5,854 | 

6, 

, 988 j 


2.3.8 Exercises 
Questions 

Q2.3.1 Slice the string s=' seehemewe ' to produce the following substrings: 

a. 'see' 

b. ' he' 


Downloaded from http:/www.cambridge.org/core. University of Illinois at Urbana - Champaign Library, on 28 Dec 2016 at 09:02:36, subject to the Cambridge Core 
terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.Org/1 0.1017/CB09781139871754.002 








2.3 Python objects I: strings 


39 


c. 'me' 

d. 'we' 

e. 'hem' 

f. ' meh' 

g. 'wee' 

Q2.3.2 Write a single-line expression for determining if a string is a palindrome (reads 
the same forward as backward). 


Q2.3.3 Predict the results of the following statements and check them using the Python 
shell. 

>>> days = 7 Sun Mon Tues Weds Thurs Fri Sat 7 


a. print (days [days . index ( 7 M 7 ) : ] ) 

b. print(days[days.index ('M'): days.index (' Sa 7 )] .rstrip ()) 

c. print (days [6 : 3 :-1] . lower () *3) 

d. print(days.replace('rs 7 , 77 ) .replace('s ', 7 7 ) [: : 4]) 

e. print (' . join (days . split () ) ) 


Q2.3.4 What is the output of the following code? How does it work? 

>>> suff = 'thstndrdththththththth 7 
>>> n = 1 

>>> print('{:d}{:s} 7 .format(n, suff[n*2:n*2+2])) 

>>> n = 3 

>>> print('{:d}{:s} 7 .format(n, suff[n*2:n*2+2])) 

>>> n = 5 

>>> print('{:d}{:s} 7 .format(n, suff [n*2:n*2+2])) 

Q2.3.5 Consider the following (incorrect) tests to see if the string ‘s’ has one of two 
values. Explain how these statements are interpreted by Python and give a correct 
alternative. 

>>> s = 7 eggs 7 

>>> s == ( 7 eggs 7 or 'ham') 

True 

>>> s == ('ham' or 'eggs 7 ) 

False 


Problems 

P2.3.1 a. Given a string representing a base-pair sequence (i.e., containing only the 
letters A, G, C and T), determine the fraction of G and C bases in the sequence. 
(Hint: strings have a count method, returning the number of occurrences of a 
substring.) 

b. Using only string methods, devise a way to determine if a nucleotide sequence 
is a palindrome in the sense that it is equal to its own complementary sequence 
read backward. For example, the sequence TGGATCCA is palindromic because 
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its complement is ACCTAGGT which is the same as the original sequence back- 
ward. The complementary base pairs are (A, T) and (C, G). 

P2.3.2 The table that follows gives the names, symbols, values, uncertainties and units 
of some physical constants. 

Defining variables of the form 


Name 

Symbol 

Value 

Uncertainty 

Units 

Boltzmann constant 

kB 

1.3806504 x IO" 23 

2.4 x 10“ 29 

JK- 1 

Speed of light 

c 

299792458 

(def) 

ms' 1 

Planck constant 

h 

6.62606896 x 10“ 34 

3.3 x 10“ 41 

J s 

Avogadro constant 

n a 

6.02214179 x 10 23 

3 x 10 16 

mol” 1 

Electron magnetic moment 

Me 

-9.28476377 x 10“ 24 

2.3 x 10“ 31 

J/T 

Gravitational constant G 

kB = 1.3806504e-23 # J/K 

kB unc = 2.4e-29 # uncertainty 

kB units = 'J/K' 

6.67428 x 10“ n 

6.7 x 10“ 15 

N m 2 kg 


use the string object’s format method to produce the following output: 

a. kB = 1.381e-23 J/K 

b. G = 0.0000000000667428 Nm A 2/kg A 2 

c. Using the same format specifier for each line, 

kB = 1.3807e-23 J/K 

mu_e = -9.2848e-24 J/T 
N_A = 6.0221e+23 mol-1 

c = 2.9979e+08 m/s 

d. Again, using the same format specifier for each line, 

=== G = +6.67E-11 [Nm^2/kg^2] === 

=== /xe = -9.28E-24 [ J/T] === 

Hint: the Unicode codepoint for the lowercase Greek letter mu is U+03BC. 

e. (Harder). Produce the output below, in which the uncertainty (one Standard devi- 
ation) in the value of each constant is expressed as a number in parentheses rela¬ 
tive the preceding digits: that is, 6.62606896(33) x IO - ’ 4 means 6.62606896 x 
IO -34 ± 3.3 x IO -41 . 

G = 6.67428(67)e-11 Nm2/kg2 
mu_e = -9.28476377 (23)e-24 J/T 


P2.3.3 Given the elements of a 3 x 3 matrix as the nine variables all, al2, ... a33, 
produce a string representation of the matrix using formatting methods, (a) assuming 
the matrix elements are (possibly negative) real numbers to be given to one decimal 
place; (b) assuming the matrix is a permutation matrix with integer entries taking the 
values 0 or 1 only. For example, 
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>>> print(s_a) 

[ 0.0 3.4 -1.2 ] 

[ -1.1 0.5 -0.2 ] 

[ 2.3 -1.4 -0.7 ] 

>>> print(s_b) 

[001] 

[010] 

[10 0 ] 

P2.3.4 Find the Unicode code points for the planet symbols listed on the NASA web- 
site (http://solarsystem.nasa.gov/multimedia/display.cfm?IM_ID=167) which mostly 
fall within the hex range 2600-26FF: Miscellaneous Symbols (www.unicode.org/ 
charts/PDF/U2600.pdf) and output a list of planet names and symbols. 

2.4 Python objects II: lists, tuples and loops 

2.4.1 Lists 

Initializing and indexing lists 

Python provides data structures for holding an ordered list of objects. In some other 
languages (e.g., C and Fortran) such a data structure is called an array and can hold 
only one type of data (e.g., an array of integers); the core array structures in Python, 
however, can hold a mixture of data types. 

A Python list is an ordered, mutable array of objects. A list is constructed by speci- 
fying the objects, separated by commas, between square brackets, []. For example, 

>>> listi = [1, 'two', 3.14, 0] 

>>> listi 

[1, 'two', 3.14, 0] 

>>> a = 4 

>>> list2 = [2, a, -0.1, listi, True] 

>>> list2 

[2, 4, -0.1, [1, 'two', 3.14, 0], True] 

Note that a Python list can contain references to any type of object: strings, the various 
types of numbers, built-in constants such as the boolean value True, and even other 
lists. It is not necessary to declare the size of a list in advance of using it. An empty list 
can be created with 1 isto = []. 

An item can be retrieved from the list by indexing it (remember Python indexes start 
at 0): 

>>> listi [2] 

3.14 

>>> list2 [-1] 

True 

>>> list2 [3] [1] 

' two' 

This last example retrieves the second (index: l) item of the fourth (index: 3) item of 
list2. This is valid because the item list2 [3] happens to be a list (the one also 
identified by the variable name listi), and listi [1] is the string 'two'. In fact, 
since strings can also be indexed: 
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>>> list2 [3] [1] [1] 

' w' 

To test for membership of a list, the operator in is used, as for strings: 

>>> 1 in listi 
True 

>>> 'two' in list2: 

False 

This last expression evaluates to False because list2 does not contain the string 
literal 'two' even though it contains listi which does: the in operator does not 
recurse into lists-of-lists when it tests for membership. 

Lists and mutability 

Python lists are the first mutable object we have encountered. Unlike strings, which 
cannot be altered once defined, the items of a list can be reassigned: 

>>> listi 

[1, 'two', 3.14, 0] 

>>> listi [2] =2. 72 
>>> listi 

[1, 'two', 2.72, 0] 

>>> list2 

[2, 4, -0.1, [1, 'two', 2.72, 0], True] 

Note that not only has listi been changed, but list2 (which contains listi as 
an item) has also changed. 24 This behavior catches a lot of people out to begin with, 
particularly if a list needs to be copied to a different variable. 

>>> ql = [1, 2, 3] 

>>> q2 = ql 

>>> ql[2] = 'oops' 

>>> ql 

[1, 2, 'oops'] 

>>> q2 

[1, 2, 'oops'] 

Here, the variables ql and q2 refer to the same list, stored in the same memory location, 
and because lists are mutable, the line ql [2] = ' oops' actually changes one of the 
stored values at that location; q2 stili points to the same location and so it appears to 
have changed as well. In fact, there is only one list (referred to by two variable names) 
and it is changed once. In contrast, integers are immutable, so the following does not 
change the value of q [ 2 ] : 

>>> a = 3 

>>> q = [1, 2, a] 

>>> a = 4 

>>> q 
[1, 2, 3] 


24 Actually, it hasn’t changed: it only ever contained a series of references to objects: the reference to listi 
is the same, even though the references within listi have changed. 
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(a) ql 
q2 




> 


(b) ql 
q2 



Figure 2.2 Two variables referring to the same list: (a) on initialization and (b) after setting 
ql[2] = 'oops'. 


(a) q 


> 


(b) q 


> 



>0 


Figure 2.3 A list defined with q = [1, 2, a] where a=3: (a) on initialization and (b) after 
changing the value of a with a=4. 


The assignment a=4 creates a whole new integer object, quite independent of the orig- 
inal 3 that ended up in the list q. This original integer object isn’t changed by the 
assignment (integers are immutable) and so the list is unchanged. This distinction is 
illustrated by Figures 2.2, 2.3 and 2.4. 
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(a) q 



(b) 



Figure 2.4 A list defined with q = [ 1, 2 , a] where a=3: (a) on initialization and (b) after 
changing the value of q with q [2] =4. 


Lists can be sliced in the same way as string sequences: 

>>> ql = [0., 0.1, 0.2, 0.3, 0.4, 0.5] 

>» ql [1:4] 

[0.1, 0.2, 0.3] 

>>> ql [::-1] # retura a reversed copy of the list 

[0.5, 0.4, 0.3, 0.2, 0.1, 0.0] 

>>> ql[1::2] # striding: returas elements at 1, 3, 5 

[0.1, 0.3, 0.5] 

Taking a slice copies the data to a new list. Hence, 

>>> q2 = ql [1: 4] 


>» q2 [1] = 99 

# only affects q2 

>>> q2 


[0.1, 99, 0.3] 


>>> ql 


[0.0, 0.1, 0.2, 

0.3, 0.4, 0.5] 


List methods 

Just as for strings, Python lists come with a large number of useful methods, summa- 
rized in Table 2.12. Because list objects are mutable, they can grow or shrink in place, 
that is, without having to copy the contents to a new object, as we had to do with strings. 
The relevant methods are 

• append: add an item to the end of the list; 
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Table 2.12 Some common list methods 


Method 


Description 


append( element) 

extend( list2) 

index( element) 

insert {index, element) 

pop () 

reverse() 

remove( element ) 

sort() 

copy() 

count (element) 


Append element to the end of the list. 

Extend the list with the elements from list2. 

Return the lowest index of the list containing element. 
Insert element at index index. 

Remove and return the last element from the list. 

Reverse the list in place. 

Remove the first occurrence of element from the list. 
Sort the list in place. 

Return a copy of the list. 

Return the number of elements equal to element in the 
list. 


• extend: add one or more objects by copying them from another list ; 25 

• insert: insert an item at a specified index and 

• remove: remove a specified item from the list. 

>>> q = [] 

>>> q.append(4) 

>>> q 
[4] 

>>> q.extend([6, 7, 8]) 

>>> q 

[4, 6, 1, 8] 

>>> q.insert(1, 5) # insert 5 at index 1 

>>> q 

[4, 5, 6, 7, 8] 

>>> q.remove(7) 

>>> q 

[4, 5, 6, 8] 

>>> q.index(8) 

3 # the item 8 appears at index 3 

Two useful list methods are sort and reverse, which sort and reverse the list in 
place. That is, they change the list object, but do not return a value: 

»> q = [2, 0, 4, 3, 1] 

>>> q.sort() 

>>> q 

[0, 1, 2, 3, 4] 

>>> q.reverse() 

>>> q 

[4, 3, 2, 1, 0] 

If you do want a sorted copy of the list, leaving it unchanged, you can use the sorted 
built-in function: 


25 Actually, any Python object that forms a sequence that can be iterated over (e.g., a string) can be used as 
the argument to extend 
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>>> q = ['a', 'e', 'A', 'c', 'b'] 

>>> sorted(q) 


a' , 

'b' , 

' c' , 

'e' ] 

# 

returns 

a new list 

e' , 

'A' , 

' c' , 

'b'] 

# 

the old 

list is unchanged 


By default, sort () and sorted () order the items in an array in ascending order. 
Set the optional argument reverse=True to return the items in descending order: 

>» q = [10, 5, 5, 2, 6, 1, 67] 

>>> sorted(q, reverse=True) 

[67, 10, 6, 5, 5, 2, 1] 

Python 3, unlike Python 2, does not allow direct comparisons between strings and 
numbers, so it is an error to attempt to sort a list containing a mixture of such types: 

>» q = [5, '4' , 2, 8] 

>>> q.sort() 

TypeError: unorderable types: str() < int() 


Example E2.16 The methods append and pop make it very easy to use a list to 
implement the data structure known as a stack: 

>>> stack = [] 

>>> stack.append(1) 

>>> stack.append(2) 

>>> stack.append(3) 

>>> stack.append(4) 

>>> print(stack) 

[1, 2, 3, 4] 

>>> stack.pop () 

4 

>>> print(stack) 

[1, 2, 3] 

The end of the list is the top of the stack from which items may be added or removed 
(think of a stack of dinner plates). 


Example E2.17 The string method, split generates a list of substrings from a given 
string, split on a specified separator: 

>>> s = 'Jan Feb Mar Apr May Jun' 

>>> s.split() # By default, splits on whitespace 

['Jan', 'Feb', 'Mar', 'Apr 7 , 'May', 'Jun'] 

>>> s = "J. M. Brown AND B. Mencken AND R. P. van't Rooden" 

>>> s.split(' AND ') 

['J. M. Brown', 'B. Mencken', "R. P. van't Rooden"] 


2.4.2 Tuples 

The tuple object 

A tuple may be thought of as an immutable list. Tuples are constructed by placing 
the items inside parentheses: 
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>>> t = (1, 'two', 3.) 

>>> t 

(1, 'two' , 3.0) 

Tuples can be indexed and sliced in the same way as lists but, being immutable, they 
cannot be appended to, extended, or have elements removed from them: 

>>> t = (1, 'two', 3.) 

»> t [1] 

' two' 

>>> t [2] =4 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

TypeError: 'tuple' object does not support item assignment 

Although a tuple itself is immutable, it may contain references to mutable objects such 
as lists. Hence, 

>>> t = (1, ['a', 'b', 'd'], 0) 

>>> t[1][2] = 'c' # OK to change the list within the tuple 

>>> t 

(1, [' a' , ' b' , ' c' ] , 0) 

An empty tuple is created with empty parentheses: to = (). To create a tuple 
containing only one item (a singletori), however, it is not sufficient to enclose the item 
in parentheses (which could be confused with other syntactical use of parentheses); 
instead, the Ione item is given a trailing comma: t = (' one' ,). 

Uses of tuples 

In some circumstances, particularly for simple assignments such as those in the previous 
section, the parentheses around a tuple’s items are not required: 

>>> t = 1, 2, 3 
>>> t 
(1, 2, 3) 

This usage is an example of tuple packing. The reverse, tuple unpacking is a common 
way of assigning multiple variables in one line: 

>>> a, b, c = 97, 98, 99 
>>> b 
98 

This method of assigning multiple variables is commonly used in preference to separate 
assignment statements either on different lines or (very un-Pythonically) on a single 
line, separated by semicolons: 

a = 97; b = 98; c = 99 # Don't do this! 

Tuples are useful where a sequence of items cannot or should not be altered. In the 
previous example, the tuple object only exists in order to assign the variables a, b and 
c. The values to be assigned: 97, 98 and 99 are packed into a tuple for the purpose of 
this statement (to be unpacked into the variables), but once this has happened, the tuple 
object itself is destroyed. As another example, a function (Section 2.7) may retum more 


Downloaded from http:/www.cambridge.org/core. University of Illinois at Urbana - Champaign Library, on 28 Dec 2016 at 09:02:36, subject to the Cambridge Core 
terms of use, available at http:/www.cambridge.org/core/terms. http://dx.doi.Org/1 0.101 7/CB09781 1 39871 754.002 


48 


The core Python language I 


than one object: these objects are returned packed into a tuple. If you need any further 
persuading, tuples are slightly faster for many uses than lists. 


Example E2.18 In an assignment using the ‘=’ operator the right-hand side expression 
is evaluated first. This provides a convenient way to swap the values of two variables 
using tuples: 

a, b = b, a 

Here, the right-hand side is packed into a tuple object, which is then unpacked into the 
variables assigned on the left-hand side. This is more convenient than using a temporary 
variable: 

t = a 
a = b 
b = t 


2.4.3 Iterable objects 

Examples of iterable objects 

Strings, lists and tuples are ali examples of data structures that are iterable objects: they 
are ordered sequences of items (characters in the case of strings, or arbitrary objects in 
the case of lists and tuples) which can be taken one at a time. One way of seeing this is 
to use the alternative method of initializing a list (or tuple) using the built-in constructor 
methods list () and tuple (). These take any iterable object and generate a list and a 
tuple respectively from its sequence of items. For example, 

>>> list('hello') 

[' h' , ' e' , ' 1' , ' 1' , ' o' ] 

>>> tuple([1, 'two', 3]) 

(1, 'two', 3) 

Because the data elements are copied in the construction of a new object using these 
constructor methods, list is another way of creating an independent list object from 
another: 

>» a = [5, 4, 3, 

>>> b = a 
>>> b is a 
True 

>>> b = list(a) 

>>> b is a 
False 

Because slices also retum a copy of the object references from a sequence, the idiom 
b = a [: ] is often used in preference to b = list (a). 

any and all 

The built-in function any tests whether any of the items in an iterable object are equiv- 
alent to True; all tests whether all of them are. For example, 


2 , 1 ] 

# b and a refer to the same list object 

# create an entirely new list object with the same contents as a 
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>>> a = [1, 0, 0, 2, 3] 

>>> any(a), all(a) 

(True, False) # some (but not all) of a's items are equivalent to True 

>>> b = [ [] , False, 0.] 

>>> any(b), all(b) 

(False, False) # none of b's items is equivalent to True 


0 * syntax 

It is sometimes necessary to call a function with arguments taken from a list or other 
sequence. The * syntax, used in a function call unpacks such a sequence into positional 
arguments to the function (see also Section 2.7). For example, the math. hypot function 
takes two arguments, a and b, and retums the quantity s/a 2 + b 2 . If the arguments you 
wish to use are in a list or tuple, the following will fail: 

>>> t = [3, 4] 

> > > math.hypot(t) 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

TypeError: hypot expected 2 arguments, got 1 

We tried to call math. hypot () with a single argument (the list object t), which is an 
error. We could index the list explicitly to retrieve the two values we need: 

>>> t = [3, 4] 

>>> math.hypot(t[0], t[1]) 

5.0 

but a more elegant method is to unpack the list into arguments to the function with *t: 

> > > math.hypot(* t) 

5.0 

for loops 

It is often necessary to take the items in an iterable object one by one and do something 
with each in turn. Other languages, such as C, require this type of loop to refer to each 
item in turn by its integer index. In Python this is possible, but the more natural and 
convenient way is with the idiom: 

for item in iterable object: 

which yields each element of the iterable object in turn to be processed by the subse- 
quent block of code. For example, 

>>> fruit_list = ['apple', 'melon', 'banana', 'orange'] 

>>> for fruit in fruit_list: 

. . . print(fruit) 


apple 

melon 

banana 

orange 


Each item in the list object fruit list is taken in turn and assigned to the variable 
fruit for the block of statements following the ‘: ’ - each statement in this block must 
be indented by the same amount of whitespace. Any number of spaces or tab characters 
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could be used, but it is strongly recommended to use four spaces to indent code. 26 
Loops can be nested - the inner loop block needs to be indented by the same amount of 
whitespace again as the outer loop (i.e. eight spaces): 

>>> fruit_list = ['apple', 'melon', 'banana', 'orange'] 

>>> for fruit in fruit_list: 

... for letter in fruit: 

... print(letter, end='.') 

print() 

a .p .p. 1. e . 
m.e.1.o.n. 
b.a.n.a.n.a. 
o.r.a.n.g.e. 

In this example, we iterate over the string items in fruit list one by one, and for 
each string (fruit name), iterate over its letters. Each letter is printed followed by a full 
stop (the body of the inner loop). The last statement of the outer loop, print () forces 
a new line after each fruit. 


Example E2.19 We have already briefly met the string method join, which takes a 
sequence of string objects and joins them together in a single string: 

>>> ', / .join( ('one', 'two', 'three') ) 

'one, two, three' 

>>> print('\n'.join(reversed(['one', 'two', 'three']))) 

three 

two 

one 

>>> ' '.join('hello') 

'h e 1 1 o' 

Recall that strings are themselves iterable sequences, so the last statement joins the 
letters of ' hello' with a single space. 


The range type 

Python provides an efficient method of referring to a sequence of numbers that forms 
a simple arithmetic progression: a n = ao + nd for n = 0,1,2, • • •. In such a sequence, 
each term is spaced by a constant value, the stride, d. In the simplest case, one 
simply needs an integer counter which runs in steps of one from an initial value 
of zero: 0,1,2, ■ • ■ ,N — 1. It would be possible to create a list to hold each of the 
values, but for most purposes this is wasteful of memory: it is easy to generate the next 
number in the sequence without having to store ali of the numbers at the same time. 


The use of whitespace as part of the syntax of Python is one of its most contentious aspects. Some people 
used to languages such as C and Java which delimit code blocks with braces ({...}) find it an anathema; 
others argue that code is almost always indented consistently to make it readable even when this isn’t 
enforced by the grammar of the language and consider it less harmful. 
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Representing such arithmetic progressions for iterating over is the purpose of the 
range type. A range object can be constructed with up to three arguments defining the 
first integer, the integer to stop at and the stride (which can be negative). 

range( [a0=0 ], n, [stride=l]) 

The notation describing the range constructor here means that if the initial value, 
aO, is not given it is taken to be 0; stride is also optional and if it is not given it is 
taken to be l. Some examples: 

>>> a = range(5) # 0,1,2,3,4 

>>> b = range(1,6) # 1,2,3,4,5 

>>> c = range(0,6,2) # 0,2,4 

>>> d = range (10, 0, -2) # 10,8,6,4,2 

In Python 3, the object created by range is not a list. 21 Rather it is an iterable object 
that can produce integers on demand: range objects can be indexed, cast into lists and 
tuples, and iterated over: 

>>> c[1] #i.e. the second element of 0,2,4 

2 

>>> c [0] 

0 

>>> list(d) # make a list from the range 

[10, 8, 6, 4, 2] 

>>> for x in range(5): 

... print(x) 

0 

1 

2 

3 

4 


Example E2.20 The Fibonacci sequence is the sequence of numbers generated by 
applying the rules: 

a \ — ti 2 = 1, cij = a ,-1 + a.j— 2 - 

That is, the ith Fibonacci number is the sum of the previous two: 1,1,2,3,5,8,13, ■ ■ ■. 
We present two ways of generating the Fibonacci series. First, by appending to a list: 

Listing 2.1 Calculating the Fibonacci series in a list 


# eg2-i-fibonacci.py 

# Calculates and Stores the first n Fibonacci numbers 

n = 100 

fib = [1, 1] 

for i in range (2, n+1) : 

fib.append(fib [i-1] + fib [i-2]) 
print(fib) 


27 In Python 2, range retumed a list and a second method, xrange, created the equivalent to Python 3’s 
range object. 
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Alternatively, we can generate the series without storing more than two numbers at a 
time as follows: 

Listing 2.2 Calculating the Fibonacci series without storing it 

# eg2-ii-fibonacci.py 

# Calculates the first n Fibonacci numbers 

n = 100 

# Keep track of the two most recent Fibonacci numbers 
a, b = 1, 1 

print(a, b, end='') 
for i in range(2, n+1): 

# The next number (b) is a+b, and a becomes the previous b 

a, b = b, a+b 

print (' ', b, end='') 


enumerate 

Because range objects can be used to produce a sequence of integers, it is tempting 
to use them to provide the indexes of lists or tuples when iterating over thenr in a for 
loop: 

>>> mammals = ['kangaroo', 'wombat', 'platypus'] 

>>> for i in range(len(mammals)): 

print(i, mammals[i]) 

0 : kangaroo 

1 : wombat 

2 : platypus 

This works, of course, but it is more natural to avoid the explicit construction of a range 
object (and the call to the len built-in) by using enumerate. This method takes an 
iterable object and produces, for each item in tum, a tuple ( count, item) , consisting 
of a counting index and the item itself: 

>>> mammals = ['kangaroo', 'wombat', 'platypus'] 

>>> for i, mammal in enumerate(mammals): 

print(i, ' : ' , mammal) 

0 : kangaroo 

1 : wombat 

2 : platypus 

Note that each (count, item) tuple is unpacked in the for loop into the variables 
i and mammal. It is also possible to set the starting value of count to something other 
than 0 (although then it won’t be the index of the item in the original list, of course): 

>>> list(enumerate(mammals, 4)) 

[(4, 'kangaroo'), (5, 'wombat'), (6, 'platypus')] 


0 zip 

What if you want to iterate over two (or more) sequences at the same time? This is what 
the zip built-in function is for: it creates an iterator object in which each item is a tuple 
of items taken in turn from the sequences passed to it: 
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>>> a = [1, 2, 3, 4] 

>>> b = ['a', 'b', 'c', 'd'] 

>>> zip(a,b) 

<builtins.zip at 0xl04476998> 

>>> for pair in zip(a,b) : 

. . . print (pair) 

(1, 'a') 

(2, 'b') 

(3 , ' c') 

(4, 'd') 

>>> list(zip(a,b) ) # convert to lis t 

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')] 

A nice feature of zip is that it can be used to u 11 zip sequences of tuples as well: 

>>> z = zip(a,b) # zip 

>>> A, B = zip(*z) # unzip 

>>> print (A, B) 

(1, 2, 3, 4) ('a', 'b', 'c', 'd') 

>>> list(A) == a, list(B) == b 

(True, True) 

zip does not copy the items into a new object, so it is memory-efficient and fast; but 
this means that you only get to iterate over the zipped items once and you can’t index 
it: 28 

>>> z = zip(a, b) : 

>>> z [0] 

TypeError: 'zip' object is not subscriptable 

>>> for pair in z : 

... x = 0 # just some dummy operat ion performed on each iteration 

>>> for pair in z: 

print(pair) 

# (nothing : we've already exhausted the iterator z) 

»> 

2.4.4 

Exercises 

Questions 

Q2.4.1 Predict and explain the outcome of the following statements using the 
variables 

s = 'hello' 

a = [4, 10, 2] 

a. print (s, sep='-') 


This is another difference between Python 2 and Python 3: in the older version of Python, zip retumed a 
list of tuples. 
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C. 


print(a) 


d. print (*a, sep='\thinspace\ ! \ ! ' ) 


e. 


list(range(*a)) 


Q2.4.2 A list could be used as a simple representation of a polynomial, Pix), with 
the items as the coefficients of the successive powers of x, and their indexes as the 
powers themselves. Thus, the polynomial P(x) = 4 + 5x + 2x 3 would be represented by 
the list [4, 5, 0, 2]. Why does the following attempt to differentiate a polynomial 
fail to produce the correct answer? 

>>> P = [4, 5, 0, 2] 

>>> dPdx = [] 

>>> for i, c in enumerate(P[1:]): 

... dPdx.append(i*c) 

>>> dPdx 

[0,0,4] # wrong! 

How can this code be fixed? 

Q2.4.3 Given an ordered list of test scores, produce a list associating each score with 
a rank (starting with 1 for the highest score). Equal scores should have the same rank. 
For example, the input list [87, 75, 75, 50, 32, 32] should produce the list of 
rankings [1,2,2,4,5,5]. 

Q2.4.4 Use a for loop to calculate n from the first 20 terms of the Madhava series: 



Q2.4.5 For what iterable sequences, x, does the expression 
any(x) and not ali(x) 
evaluate to True? 

Q2.4.6 Explain why zip(*z) is the inverse of z = zip(a, b) - that is, while z 
pairs the items: (a0, bO) , (al, bl) , (a2, b2) , ...,zip(*z) separates them 

again: (a0, al, a2, ...), (bO, bl, b2, ...). 

Q2.4.7 Sorting a list of tuples arranges them in order of the first element in each tuple 
first. If two or more tuples have the same first element, they are ordered by the second 
element, and so on: 

>» sorted ([(3,1), (1,4), (3,0), (2, 2), (1, -1)]) 

[(1, -1), (1, 4), (2, 2), (3, 0), (3, 1)] 

This suggests a way of using zip to sort one list using the elements of another. Imple- 
ment this method on the data below to produce an ordered list of the average amount of 
sunshine in hours in Fondon by month. Output the sunniest month first. 
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Jan 

Feb 

Mar 

Apr 

May 

Jun 

44.7 

65.4 

101.7 

148.3 

170.9 

171.4 

Jul 

Aug 

Sep 

Oct 

Nov 

Dec 

176.7 

186.1 

133.9 

105.4 

59.6 

45.8 


Problems 

P2.4.1 Write a short Python program which, given an array of integers, a, calculates 
an array of the same length, p, in which p [i] is the product of all the integers in a 
except a [ i ]. So, for example, if a = [1,2,3], then p is [ 6 , 3 , 2 ]. 

P2.4.2 The Hamming distance between two equal-length strings is the number of 
positions at which the characters are different. Write a Python routine to calculate the 
Hamming distance between two strings, sl and s2. 

P2.4.3 Using a tuple of strings naming the digits 0-9, create a Python program which 
outputs the representation of tt as read aloud to 8 decimal places: 

three point one four one five nine two six five 

P2.4.4 Write a program to output a nicely formatted depiction of the first eight rows 
of PascaPs Triangle. 

P2.4.5 A DNA sequence encodes each amino acid making up a protein as a three- 
nucleotide sequence called a codon. For example, the sequence fragment AGTCT- 
TATATCT contains the codons (AGT, CTT, ATA, TCT) if read from the first position 
(“frame ”). If read in the second frame it yields the codons (GTC, TTA, TAT) and in the 
third (TCT, TAT, ATC). 

Write some Python code to extract the codons into a list of 3-letter strings given a 
sequence and frame as an integer value (0, 1 or 2). 

P2.4.6 The factorial function, n\ = 1 • 2 ■ 3 ■ • ■ • (n — l)n is the product of the first n 
positive integers and is provided by the math module’s factorial method. The double 
factorial function, n!!, is the product of the positive odd integers up to and including n 
(which must itself be odd): 

(*+ P/2 

n\\ = J (2z — 1) = 1 ■ 3 • 5 ■ • • {n — 2) • n. 

i= 1 

Write a routine to calculate nll in Python. 

As a bonus exercise, extend the formula to allow for even n as follows: 

n/2 

n\\ = I^(2z) = 2 • 4 • 6 • • • (n — 2) • n. 

i= 1 

P2.4.7 Benford’s Law is an observation about the distribution of the frequencies of the 
first digits of the numbers in many different data sets. It is frequently found that the first 
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digits are not uniformly distributed, but follow the logarithmic distribution 

(d+ 1\ 

PW=log,„(—). 

That is, numbers starting with 1 are more common than those starting with 2, and so on, 
with those starting with 9 the least common. The probabilities follow: 

1 0.301 

2 0.176 

3 0.125 

4 0.097 

5 0.079 

6 0.067 

7 0.058 

8 0.051 

9 0.046 


Benford’s Law is rnost accurate for data sets which span several orders of magnitude, 
and can be proved to be exact for some infinite sequences of numbers. 


1 

2 


Demonstrate that the first digits of the first 500 Fibonacci numbers (see Example 
E2.20) follow Benford’s Law quite closely. 

The length of the amino acid sequences of 500 randomly chosen proteins are 
provided in the file ex2-4_e_ii_protein_lengths . py which can be down- 

. This file contains a list, naa, which can be 


loaded from 


scipy thon. com/ex/aba 


imported at the start of your program with 


from ex2-4_e_ii_protein_lengths import naa 


To what extent does the distribution of protein lengths obey Benford’s Law? 


2.5 Control flow 

Few computer programs are executed in a purely linear fashion, one statement after 
another as written in the source code. It is more likely that during the program execution, 
data objects are inspected and blocks of code executed conditionally on the basis of 
some test carried out on them. Thus, ali practical languages have the equivalent of an 
if-then-(else) construction. This section explains the syntax of Python’s version of this 
clause and covers a further kind of loop: the while loop. 


2.5.1 


if ... elif ... else 

The if ... elif ... else construction allows statements to be executed condi¬ 
tionally, depending on the resuit of one or more logical tests (which evaluate to the 
boolean values True or False): 
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if <logical expressiori 1>: 

<statements 1> 
elif <logical expression 2>: 
<statements 2> 


else: 

<statements> 


That is, if <logical expression 1> evaluates to True, <statements 1> are exe- 
cuted; otherwise, if <logical expression 2> evaluates to True, <statements 
2> are executed, and so on; if none of the preceding logical expressions evaluate to 
True, the statements in the block of code following else : are executed. These state - 
ment blocks are indented with whitespace, as for the for loop. For example, 

for x in range(lO): 
if x <= 3: 

print(x, 'is less than or equal to three') 
elif x > 5: 

print(x, 'is greater than five') 

else: 

print(x, 'must be four or five, then') 

produces the output: 

0 is less than or equal to three 

1 is less than or equal to three 

2 is less than or equal to three 

3 is less than or equal to three 

4 must be four or five, then 

5 must be four or five, then 

6 is greater than five 

7 is greater than five 

8 is greater than five 

9 is greater than five 

It is not necessary to enclose test expressions such as x <= 3 in parentheses, as it is 
in C, for example, but the colon following the test is mandatory. The test expressions 
don’t, in fact, have to evaluate explicitly to the boolean values True and False: as we 
have seen, other data types are taken to be equivalent to True unless they are o (int) 
or o . (f loat), the empty string, '', empty list, [], the empty tuple, (), and so forth or 
Python’s special type, None (see Section 2.2.4). Consider: 

for x in range(lO): 
if x % 2: 

print(x, 'is odd!') 

else: 

print(x, 'is even!') 

This works because x % 2 = l for odd integers, which is equivalent to True and x % 
2 = 0 for even integers, which is equivalent to False. 

There is no switch ... case ... finally construction in Python - equivalent 
control flow can be achieved with i f ... elif ... endif or with dictionaries (see 
Section 4.2). 
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Example E2.21 In the Gregorian calendar a year is a leap year if it is divisible by 4 
with the exceptions that years divisible by 100 are not leap years unless they are also 
divisible by 400. The following Python program determines if year is a leap year. 

Listing 2.3 Determining if a year is a leap year 

year = 1900 


if not year % 400: 

is_leap_year = True 
elif not year % 100: 

is_leap_year = False 
elif not year % 4: 

is_leap_year = True 
else: 

is_leap_year = False 

s_ly = 'is a' if is_leap_year else 'is not a' 
print('{:4d} {:s} leap year'.format(year, s_ly)) 


Hence the output: 

1900 is not a leap year 


2.5.2 whileloops 

Whereas a for loop is established for a fixed number of iterations, statements within 
the block of a while loop execute only and as long as some condition holds: 

>>> i = 0 
>>> while i < 10: 
i += 1 

... print(i, end='.') 

>>> print () 

1.2.3.4.5.6.7.8.9.10. 

The counter i is initialized to o, which is less than 10 so the while loop begins. On 
each iteration, i is incremented by one and its value printed. When i reaches 10, on the 
following iteration i < 10 is False: the loop ends and execution continues after the 
loop, where print () outputs a newline. 


Example E2.22 A more interesting example of the use of a while loop is given by this 
implementation of Euclid’s algorithm for finding the greatest common divisor of two 
numbers, gcd (a,b): 

»> a, b = 1071, 462 

>>> while b: 

... a, b = b, a % b 

>>> print(a) 

21 
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The loop continues until b divides a exactly; on each iteration, b is set to the remainder 
of a//b and then a is set to the old value of b. Recall that the integer 0 evaluates as 
boolean False so while b: is equivalent to while b != 0:. 


2.5.3 More control flow: break, pass, continue and else 
break 

Python provides three further statements for controlling the flow of a program. The 
break command, issued inside a loop, immediately ends that loop and moves execution 
to the statements following the loop: 

x = 0 

while True: 
x += 1 

if not (x % 15 or x % 25): 

break 

print(x, 'is divisible by both 15 and 25') 

The while loop condition here is (literally) always True so the only escape from the 
loop occurs when the break statement is reached. This occurs only when the counter x 
is divisible by both 15 and 2 5. The output is therefore: 

75 is divisible by both 15 and 25 

Similarly, to find the index of the first occurrence of a negative number in a list: 

alist = [0, 4, 5, -2, 5, 10] 
for i, a in enumerate(alist): 
if a < 0: 

break 

print(a, 'occurs at index', i) 

Note that after escaping from the loop, the variables i and a have the values that they 
had within the loop at the break statement. 

continue 

The continue statement acts in a similar way to break but instead of breaking out 
of the containing loop, it immediately forces the next iteration of the loop without 
completing the statement block for the current iteration. For example, 

for i in range(l, 11): 
if i % 2: 

continue 

print(i, 'is even!') 

prints only the even integers 2,4, 6 , 8 , 10: if i is not divisible by 2 (and hence i % 2 
is 1, equivalent to True), that loop iteration is canceled and the loop resumed with the 
next value of i (the print statement is skipped). 

pass 

The pass command does nothing. It is useful as a “stub” for code that has not yet 
been written but where a statement is syntactically required by Python’s whitespace 
convention. 
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>>> for i in range(l, 11): 

... if i == 6: 

... pass # do something special if i is 6 

... if not i % 3: 

... print(i, 'is divisible by 3') 

3 is divisible by 3 
6 is divisible by 3 

9 is divisible by 3 

If the pass statement had been continue the line 6 is divisible by 3wouldnot 
have been printed: execution would have retumed to the top of the loop and i=7 instead 
of continuing to the second i f statement. 

0 else 

A for or while loop may be followed by an else block of statements, which will 
be executed only if the loop finished “normally” (that is, without the intervention of a 
break). For for loops, this means these statements will be executed after the loop has 
reached the end of the sequence it is iterating over; for while loops, they are executed 
when the while condition becomes False. For example, consider again our program 
to find the first occurrence of a negative number in a list. This code behaves rather oddly 
if there aren’t any negative numbers in the list: 

>>> alist = [0, 4, 5, 2, 5, 10] 

>>> for i, a in enumerate(alist): 

... if a < 0: 

. . . break 

>>> print(a, 'occurs at index', i) 

10 occurs at index 5 

It outputs the index and number of the last item in the list (whether it is negative 
or not). A way to improve this is to notice when the for loop runs through every 
item without encountering a negative number (and hence the break) and output a 
message: 

>>> alist = [0, 4, 5, 2, 5, 10] 

... for i, a in enumerate(alist): 

... if a < 0: 

... print(a, 'occurs at index', i) 

. . . break 

. . . else: 

... print('no negative numbers in the list') 


no negative numbers in the list 

As another example, consider this (not particularly elegant) routine for finding the 
largest factor of a number a > 2: 

a = 1013 
b = a - 1 

while b != 1: 

if not a % b: 
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print('the largest factor of', a, 'is', b) 

break 

b -= 1 

else: 

print {a, 'is prime!') 

b is the largest factor not equal to a. The while loop continues as long as b is not equal 
to l (in which case a is prime) and decrements b after testing if b divides a exactly; if 
it does, b is the highest factor of a, and we break out of the while loop. 


Example E2.23 A simple “turtle” virtual robot lives on an infinite two-dimensional 
plane on which its location is always an integer pair of (x,y) coordinates. It can face 
only in directions parallel to the x and y axes (i.e. ‘North,’ ‘East,’ ‘South’ or ‘West’) and 
it understands four commands: 

• f: move forward one unit; 

• l: tum left (counterclockwise) by 90°; 

• r: tum right (clockwise) by 90°; 

• s: stop and exit. 

The following Python program takes a list of such commands as a string and tracks 
the turtle’s location. The turtle starts at (0,0), facing in the direction (1,0) (‘East’). 
The program ignores (but wams about) invalid commands and reports when the turtle 
crosses its own path. 

Listing 2.4 A Virtual turtle robot 


# eg2-turtle .py 

commands = 'FFFFFLFFFLFFFFRRRFXFFFFFFS' 

# Current location, current facing direction 
x, y = 0, 0 

dx # dy =1, 0 

# Keep track of the turtle's location in the list of tuples, locs 
locs = [ (0, 0)] 

O for cmd in commands: 
if cmd == 'S': 

# Stop command 

break 

if cmd == 'F': 

# Move forward in the current direction 
x += dx 

y += dy 

if (x, y) in locs: 

print('Path crosses itself at: ({}/ {})'.format(x,y)) 
locs.append((x,y)) 
continue 
if cmd in 'LR': 

# Tum to the left (counterclockwise) or right (clockwise) 

# L => (dx, dy) : (1,0) -> (0, 1) -> (-1,0) -> (0,-1) -> (1,0) 

# R => (dx, dy) : (1,0) -> (0,-1) -> (-1,0) -> (0, 1) -> (1,0) 

sgn = 1 
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if dy != 0 : 

sgn = -1 
if cmd == 'R 7 : 
sgn = -sgn 

dx, dy = sgn * dy, sgn * dx 

continue 

# if we're here it's because we don't recognize the command: warn 
print('Unknown command:', cmd) 

© else: 

# We exhausted the commands without encountering an S for STOP 
print('Instructions ended without a STOP') 

# Piot a path of asterisks 

# First find the total range of x and y values encountered 
© X, y = zip(*locs) 

xmin, xmax = min(x), max(x) 
ymin, ymax = min(y), max(y) 

# The grid size needed for the piot is (nx, ny) 
nx = xmax - xmin + 1 

ny = ymax - ymin + 1 

# Reverse the y-axis so that it decreases *down* the screen 
for iy in reversed(range(ny)): 

for ix in range(nx): 

if (ix+xmin, iy+ymin) in locs: 

print('* 7 , end=' 7 ) 
else: 

print( 7 7 , end= 7 7 ) 

print () 

O We can iterate over the string commands to take its characters one at a time. 

© Note that the else : clause to the for loop is only executed if we do not break out 
of it on encountering a STOP command. 

© We unzip the list of tuples, locs, into separate sequences of the x and y coordinates 
with zip (*locs). 

The output produced from the commands given is: 

Unknown command: X 

Path crosses itself at: (1, 0) 

k k k k k 
k k 

k k 

k k k k k k 
k 
k 
k 
k 


2.5.4 Exercises 
Questions 

Q2.5.1 Write a Python program to normalize a list of numbers, a, such that its values 
lie between 0 and 1. Thus, for example, the list a = [ 2 , 4 , 10 , 6 , 8 , 4 ] becomes [ 0 . 0 , 
0.25, 1.0, 0.5, 0.75, 0.25]. 
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Hint: use the built-ins min, and max which return the minimum and maximum values 
in a sequence respectively; for example, min (a) retums 2 in the earlier mentioned list. 

Q2.5.2 Write a while loop to calculate the arithmetic-geometria mean (AGM) of two 
positive real numbers, x and y, defined as the limit of the sequences: 



starting with ciq = x, bo = y. Both sequences converge to the same number, denoted 
agm(x, y). Use your loop to determine Gauss’s constant, G = l/agm(l, V2). 

Q2.5.3 The game of “Fizzbuzz” involves counting, but replacing numbers divisible by 
3 with the word ‘Fizz,’ those divisible by 5 with ‘ Buzz ,’ and those divisible by both 3 
and 5 with ' Fiz.zBuzz- Write a program to play this game, counting up to 100. 

Q2.5.4 Straight-chain alkanes are hydrocarbons with the general stoichiometric for¬ 
mula C„H 2 „+ 2 , in which the carbon atoms form a simple chain: for example, butane, 
C 4 H 10 has the structural formula that may be depicted H 3 CCH 2 CH 2 CH 3 . Write a pro¬ 
gram to output the structural formula of such an alkane, given its stoichiometry (assume 
n > 1). For example, given stoich=' C8H18 ', the output should be 
H3 C - CH2 - CH2 - CH2 -CH2 - CH2 - CH2 -CH3 


Problems 


P2.5.1 Modify your solution to Problem P2.4.4 to output the first 50 rows of PascaPs 
triangle, but instead of the numbers themselves, output an asterisk if the number is odd 
and a space if it is even. 

P2.5.2 The iterative weak acid approximation determines the hydrogen ion concen- 
tration, [H + ] of an acid solution from the acid dissociation constant, K a , and the acid 
concentration, c, by successive application of the formula 



starting with [Fl + ]o = 0. The iterations are continued until [H + ] changes by less than 
some predetermined, small tolerance value. 

Use this method to determine the hydrogen ion concentration, and hence the pH 
(= — log 10 [Fl + ]) of a c = 0.01 M solution of acetic acid ( K a = 1.78 x 10 -5 ). Use 
the tolerance tol = 1 . e - 1 0. 

P2.5.3 The Luhn algorithm is a simple checksum formula used to validate credit card 
and bank account numbers. It is designed to prevent common errors in transcribing the 
number, and detects all single-digit errors and almost all transpositions of two adjacent 
digits. The algorithm may be written as the following steps: 
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1. Reverse the number. 

2. Treating the number as an array of digits, take the even-indexed digits (where 
the indexes start at 1) and double their values. If a doubled digit results in a 
number greater than 10, add the two digits (e.g., the digit 6 becomes 12 and hence 
1+2 = 3). 

3. Sum this modilied array. 

4. If the sum of the array modulo 10 is 0 the credit card number is valid. 

Write a Python program to take a credit card number as a string of digits (possibly in 
groups, separated by spaces) and establish if it is valid or not. For example, the string 
'4799 2739 8713 62 72 ' is a valid credit card number, but any number with a single 
digit in this string changed is not. 

P2.5.4 Hero’s method for calculating the square root of a number, S, is as follows: 
starting with an initial guess, xo, the sequence of numbers x n+ \ = \{x n + S/x„) are 
successively better approximations to \fS. Implement this algorithm to estimate the 
square root of 2117519.73 to two decimal places and compare with the “exact” answer 
provided by the math.sqrt method. For the purpose of this exercise, start with an 
initial guess, xo = 2000. 

P2.5.5 Write a program to determine the tomorrow’s date given a string representing 
today’s date, today, as either “d/m/y” or “m/d/y.” Cater for both British and 
US-style dates when parsing today according to the value of a boolean variable 
us_date_style. For example, when us_date_style is False and today is 
'3/4/2014', tomorrow’s date should be reported as ' 4/4/2014 '. 29 ( Hint: use the 
algorithm for determining if a year is a leap year, which is provided in the example to 
Section 2.5.1.) 

P2.5.6 Write a Python program to determine/(«), the number of trailing zeros in /?!, 
using the special case of de Polignac 's formula : 



where [xj denotes th efloor of x, the largest integer less than or equal to x. 

P2.5.7 The hailstone sequence starting at an integer n > 0 is generated by the repeated 
application of the three rules: 

• if n = 1, the sequence ends; 

• if n is even, the next number in the sequence is «/2; 

• if n is odd, the next number in the sequence is 3n + 1. 

a. Write a program to calculate the hailstone sequence starting at 27. 

b. Let the stopping time be the number of numbers in a given hailstone sequence. 
Modify your hailstone program to retum the stopping time instead of the numbers 


29 In practice, it would be better to use Python’s datetime library (described in Section 4.5.3), but avoid it 
for this exercise. 


Downloaded from http:/www.cambridge.org/core. University of Illinois at Urbana - Champaign Library, on 28 Dec 2016 at 09:02:36, subject to the Cambridge Core 
terms of use, available at http:/www. cambridge.org/core/terms. http://dx.doi.org/10.1017/CB09781 139871754.002 




2.5 Controlflow 


65 


themselves. Adapt your program to demonstrate that the hailstone sequences 
started with 1 < n < 100 agree with the CoIIatz conjecture (that ali hailstone 
sequences stop eventually). 

P2.5.8 The algorithm known as the Sieve of Eratostbenes linds the prime numbers in 
a list 2,3, • ■ • , n. It may be summarized as follows, starting at p = 2, the first prime 
number: 

Step 1. Mark ali the multiples of p in the list as nonprime (that is, the numbers 
mp where m = 2,3,4, • • •: these numbers are composite. 

Step 2. Find the first unmarked number greater than p in the list. If there is no such 
number, stop. 

Step 3. Let p equal this new number and return to Step 1. 

When the algorithm stops, the unmarked numbers are the primes. 

Implement the Sieve of Eratosthenes in a Python program and find all the primes 
under 10000. 

P2.5.9 Euler’s totient function, <p(n), counts the number of positive integers less than 
or equal to n that are relatively prime to n. (Two numbers, a and b, are relatively prime 
if the only positive integer that divides both of them is 1; that is, if gcd(a, b) = 1.) 
Write a Python program to compute 4>(n) for 1 < n < 100. 

( Hint: you could use Euclid’s algorithm for the greatest conrmon divisor given in the 
example to Section 2.5.2.) 

P2.5.10 The value of n may be approximated by Monte Carlo methods. Consider 
region of the xy-plane bounded by 0 < x < 1 and 0 < y < i- By selecting a large 
number of random points within this region and counting the proportion of them lying 
beneath the function y = Vl — x 2 describing a quarter-circle, one can estimate tt/ 4, 
this being the area bounded by the axes and y(x). Write a program to estimate the value 
of 7i by this method. 

Hint: use Python’s random module. The method random. random ( ) generates a 
(pseudo-)random number between 0. and 1. See Section 4.5.1 for more information. 

P2.5.11 Write a program to take a string of text (words, perhaps with punctuation, 
separated by spaces) and output the same text with the middle letters shuffled randomly. 
Keep any punctuation at the end of words. For example, the string: 

Four score and seven years ago our fathers brought forth on this continent a new nation, conceived 
in liberty, and dedicated to the proposition that all men are created equal. 
might be rendered: 

Four sorce and seevn yeras ago our fhtaers bhrogut ftroh on this cnnoientt a new noitan, cvieecond 
in lbrteiy, and ddicetead to the ptosoiporin that all men are cetaerd euaql. 

Hint: random. shuf f le shuffles a list of items in place. See Section 4.5.1. 

P2.5.12 The electron configuration of an atom is the specification of the distribution of 
its electrons in atomic orbitals. An atomic orbital is identified by a principal quantum 
number, n = 1,2,3, ■ ■ • defining a shell comprised of one or more subshells defined 
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by the azimuthal quantum number, I = 0,1,2, ■■■,« — 1. The values / = 0,1,2,3 are 
refered to be the letters s, p, d and / respectively. Thus, the first few orbitals are ls 
(n = 1,1 = 0), 2s (n = 2, l = 0), 2p (n = 2,1 = 1), 3s (n = 3, l = 0), and each shell has 
n subshells. A maximum of 2(2 1 +1) electrons may occupy a given subshell. 

According to the Madelung rule, the N electrons of an atom fili the orbitals in order 
of increasing n + / such that whenever two orbitals have the same value of n + /, they 
are filled in order of increasing n. For example, the ground state of Titanium (N = 22) 
is predicted (and found) to be \s 2 2s 2 2p 6 3s 2 3p 6 As 2 3d 2 . 

Write a program to predict the electronic configurations of the elements up to Ruther- 
fordium (N = 104). The output for Titanium should be 

Ti: Is2.2s2.2p6.3s2.3p6.4s2.3d2 

A Python list containing the element symbols in order may be downloaded from 
scipython.com/ex/abb . 

As a bonus exercise, modify your program to output the configurations using the 
convention that the part of the configuration corresponding to the outermost closed shell, 
a noble gas configuration, is replaced by the noble gas Symbol in square brackets; thus, 

Ti: [Ar].4s2.3d2 

the configuration of Argon being ls2.2s2.2p6.3s2.3p6. 


2.6 File input/output 

Until now, data has been hard-coded into our Python programs, and output has been to 
the console (the terminal). Of course, it will frequently be necessary to input data from 
an external file and to write data to an output file. To acheive this, Python has file 
objects. 


2.6.1 Opening and closing a file 

A file object is created by opening a file with a given filename and mode. The filename 
may be given as an absolute path, or as a path relative to the directory in which the 
program is being executed. mode is a string with one of the values given in Table 2.13. 
For example, to open a file for text-mode writing: 

>>> f = open('myfile.txt', ' w') 

file objects are closed with the close method: for example, f . close (). Python 
closes any open file objects automatically when a program teiminat.es. 


2.6.2 Writing to a file 

The write method of a f ile object writes a string to the file and retums the number 
of charae ters written: 

>>> f.write('Helio World!') 

12 
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Table2.13 File modes 


mode argument 

Open mode 

r 

text, read-only (the default) 

w 

text, write (an existing file with the same nanie will be overwritten) 

a 

text, append to an existing file 

r+ 

text, reading and writing 

rb 

binary, read-only 

wb 

binary, write (an existing file with the same name will be overwritten) 

ab 

binary, append to an existing file 

rb+ 

binary, reading and writing 


More helpfully, the print built-in takes an argument, file, to specify where to 
redirect its output: 

>>> print(35, ' C1' , 2 , sep='', file=f) 

writes ‘35C12’ to the file opened as file object f instead of to the console. 


Example E2.24 The following program writes the first four powers of the numbers 
between 1 and 1,000 in comma-separated fields to the file powers. txt: 

f = open('powers.txt', 'w') 

for i in range(1,1001): 

print(i, i**2, i**3, i**4, sep= / / ' , file=f) 
f.close () 

The file contents are 

1, i, i, i 

2, 4, 8, 16 

3, 9, 27, 81 

999, 998001, 997002999, 996005996001 

1000 , 1000000 , 1000000000 , 1000000000000 


2.6.3 Reading from a file 

To read n bytes from a file, call f . re ad (n) . If n is omitted, the entire file is read in. 30 

readline () reads a single line from the file, up to and including the newline char¬ 
acter. The next call to readline () reads in the next line, and so on. Both read () and 
readline () return an empty string when they reach the end of the file. 

To read all of the lines into a list of strings in one go, use f . readlines (). 
file objects are iterable, and looping over a (text) file returns its lines one at a 
time: 


30 To quote the official documentation: “it’s your problem if the file is twice as large as your machine’s 
memory.” 
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>>> for line in f: 

O ... print(line, end='') 

First line 
Second line 


O Because line retains its newline character when read in, we use end= '' to prevent 
print from adding another, which would be output as a blank line. 

You probably want to use this method if your file is very large unless you really 
do want to store every line in memory. See Section 4.3.4 conceming Python’s with 
statement for more best practice in file handling. 


Example E2.25 To read in the numbers from the file powers. txt generated in the 
previous example, the columns rnust be converted to lists of integers. To do this, each 
line must be split into its fields and each field explicitly converted to an int: 

f = open('powers.txt ' , 'r') 
squares, cubes, fourths = [] , [] , [] 

for line in f.readlines(): 
fields = line.split(',') 
squares.append(int(fields[1])) 
cubes.append(int(fields[2])) 
fourths.append(int(fields[3])) 
f.close() 
n = 500 

print(n, 'cubed is', cubes[n-1]) 

The output is 

500 cubed is 125000000 

In practice, it is better to use numpy (see Chapter 6) to read in data files such as these. 


2.6.4 Exercises 

Problems 

P2.6.1 The coast redwood tree species, Sequoia sempervirens, includes some of 
the oldest and tallest living organisms on Earth. Some details concerning individ- 
the tab-delimited text file redwood-data.txt, available at 
. (Data courtesy of the Gymnosperm database, www.conifers. 


Write a Python program to read in this data and report the tallest tree and the tree 
with the greatest diameter. 

P2.6.2 Write a program to read in a text file and censor any words in it that are on a 
list of banned words by replacing their letters with the same number of asterisks. Your 
program should store the banned words in lowercase but censor examples of these words 
in any case. Assume there is no punctuation. 


ual trees are given m 
scipy thon. com/ex/abd 
org/cu/Sequoia.php) 
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Table 2.14 Parameters used in the definition of ESI 


i 

Parameter 

Earth value, x/ ® 

Weight, Wi 

1 

Radius 

1.0 

0.57 

2 

Density 

1.0 

1.07 

3 

Escape velocity, v e sc 

1.0 

0.7 

4 

Surface temperature 

288 K 

5.58 


As a bonus exercise, handle text that contains punctuation. For example, given the 
list of banned words: ['C', 'Perl', ' Fortran' ] the sentence 

'Some alternative programming languages to Python are C, C++, Perl, Fortran and Java.' 

becomes 

'Some alternative programming languages to Python are *, C++, ****, ******* and Java.' 


P2.6.3 The Eartli Similarity Index (ESI) attempts to quantify the physical similarity 
between an astronomical body (usually a planet or moon) and Earth. It is defined by 



1 - 


x ij x i,(B 
Xij + -F,© 


Wi/n 


where the parameters xq are described, and their terrestrial values, x,_® and weights, w- t 
given in Table 2.14. The radius, density and escape velocities are taken relative to the 
terrestrial values. The ESI lies between 0 and 1, with the values closer to 1 indicating 
closer similarity to Earth (which has an ESI of exactly 1: Earth is identical to itself!). 

The file ex2-6-g-esi-data.txt available from scipython.com/ex/abc contains 
the earlier mentioned parameters for a range of astronomical bodies. Use these data to 
calculate the ESI for each of the bodies. Which has properties “closest” to those of the 
Earth? 


P2.6.4 Write a program to read in a two-dimensional array of strings into a list of 
lists from a file in which the string elements are separated by one or more spaces. The 
number of rows, m, and columns, n, may not be known in advance of opening the file. 
For example, the text file 

A B C D 
E F G H 
I J K L 

should create an object, grid, as 

[['A', 'B', 'C', 'D'], ['E', 'F', 'G', 'H'], ['I', 'J', ' K' , 'L']] 

Read like this, grid contains a list of the array’s rows. Once the array has been read in, 
write loops to output the columns of the array: 

[['A', 'E', 'I'], ['B', 'F', 'J'], ['C', 'G', 'K'], ['D', 'H', 'L']] 

Harder. also output all its diagonals read in one direction: 

[['A'], ['B', 'E'], ['C', 'F', 'I'], t'D', 'G', 'J'], ['H', ' K' ] , ['L']] 
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and the other direction: 

[ [' D' ] , ['C', 'H'], ['B', 'G', 'L'], ['A', 'F', ' K'] , ['E', 'J'], ['I']] 


2.7 Functions 

A Python function is a set of statements that are grouped together and named so that 
they can be run more than once in a program. There are two main advantages to using 
functions. First, they enable code to be reused without having to be replicated in differ¬ 
ent parts of the program; second, they enable complex tasks to be broken into separate 
procedures, each implemented by its own function - it is often much easier and more 
maintainable to code each procedure individually than to code the entire task at once. 

2.7.1 Defining and calling functions 

The def statement delines a function, gives it a name, and lists the arguments (if any) 
that the function expects to receive when called. The function’s statements are written 
in an indented block following this def. If at any point during the execution of this 
statement block a return statement is encountered, the specified values are returned to 
the caller. For example, 

O >>> def square(x): 

... x_squared = x**2 
... return x_squared 

>>> number = 2 

© »> number_squared = square(number) 

>>> print (number, 'squared is', number_squared) 

2 squared is 4 

© »> print(' 8 squared is', square(8)) 

8 squared is 64 

O The simple function named square takes a single argument, x. It calculates x**2 
and retums this value to the caller. Once defined, it can be called any number of times. 
© In the first example, the retum value is assigned to the variable number squared; 
© in the second example, it is fed straight into the print method for output to the 
console. 

To return two or more values from a function, pack them into a tuple. For example, 
the following program delines a function to retum both roots of the quadratic equation 
ax 2 + bx + c (assuming it has two real roots): 

import math 


def roots(a, b, c): 
d = b**2 - 4*a*c 

rl = (-b + math.sqrt(d)) / 2 / a 
r2 = (-b - math.sqrt(d)) / 2 / a 

return rl, r2 

print (roots(1., -1., -6.)) 
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When run, this program outputs, as expected: 

(3.0, -2.0) 

It is not necessary for a function to explicitly return any object: functions that fall 
off the end of their indented block without encountering a return statement return 
Python’s special value, None. 

Function definitions can appear anywhere in a Python program, but a function cannot 
be referenced before it is defined. Functions can even be nested, but a function defined 
inside another is not (directly) accessible from outside that function. 

Docstrings 

A function docstring is a string literal that occurs as the first statement of the function 
definition. It should be written as a triple-quoted string on a single line if the func¬ 
tion is simple, or on multiple lines with an initial one-line summary for more detailed 
descriptions of complex functions. For example, 

def roots(a, b, c): 

"""Return the roots of ax*2 + bx + c.""" 
d = b**2 - 4*a*c 

The docstring becomes the special_ doc _attribute of the function: 

> > > roots._doc_ 

'Return the roots of ax^2 + bx + c.' 

A docstring should provide details about how to use the function: which arguments to 
pass it and which objects it returns, 31 but should not generally include details of the 
specific implementation of algorithms used by the function (these are best explained in 
comments, preceded by #). 

Docstrings are also used to provide documentation for classes and modules (see 
Sections 4.5 and 4.6.2). 


Example E2.26 In Python, functions are “first class” objects: they can have variable 
identifiers assigned to them, they can be passed as arguments to other functions, and 
they can even be returned from other functions. A function is given a name when it 
is defined, but that name can be reassigned to refer to a different object (don’t do this 
uniess you mean to!) if desired. 

As the following example demonstrates, it is possible for more than one variable 
name to be assigned to the same function object. 

>>> def cosec(x): 

... """Return the cosecant of x, cosec (x) = l/sin(x).""" 

... return 1./math.sin(x) 


>>> cosec 

<function cosec at 0xl00375170> 


31 


For larger projects, docstrings document an application programming interface (API) for the project. 
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>>> cosec(math.pi/4) 

1.4142135623730951 
O >>> esc = cosec 

>>> CSC 

<function cosec at 0xl00375170> 

>>> esc(math.pi/4) 

1.4142135623730951 

O The assignment esc = cosec associates the identifier (variable name) esc with 
the same function object as the identifier cosec: this function can then be called with 
esc () as well as with cosec (). 


27.2 Default and keyword arguments 

Keyword arguments 

In the previous example, the arguments have been passed to the function in the order in 
which they are given in the function’s definition (these are called positional arguments). 
It is also possible to pass the arguments in an arbitrary order by setting them explicitly 
as keyword arguments : 

roots(a=l., c=-6., b=-l.) 
roots(b=-l., a=l. , c=-6.) 

If you mix nonkeyword (positional) and keyword arguments the former must come first; 
otherwise Python won’t know to which variable the positional argument corresponds: 

>>> roots(1., c=6. , b=-l.) # OK 

(3.0, -2.0) 

>>> roots(b=-l., 1., -6.) # Oops : which is a and which is c? 

File "<stdin>", line 1 

SyntaxError: non-keyword arg after keyword arg 


Default arguments 

Sometimes you want to define a function that takes an optional argument: if the caller 
doesn’t provide a value for this argument, a default value is used. Default arguments are 
set in the function definition: 

>>> def report_length(value, units='m'): 

... return 'The length is {:.2f} {}'.format(value, units) 

>>> report_length(33.136, 'ft') 

'The length is 33.14 ft' 

>>> report_length(10.1) 

'The length is 10.10 m' 

Default arguments are assigned when the Python interpreter first encounters the 
function definition. This can lead to some unexpected results, particularly for mutable 
arguments. For example, 

>>> def fune(alist = []): 

... alist.append(7) 

... return alist 
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>>> func() 

[7] 

>>> func() 

[7, 7] 

>>> func() 

[7 , 7, 7] 

The default argument to the function fune here is an empty list, but it is the specific 
empty list assigned when the function is defined. Therefore, each time fune is called 
this specific list grows. 


Example E2.27 Default argument values are assigned when the function is defined. 
Therefore, if a function is defined with an argument defaulting to the value of some 
variable, subsequently changing that variable will not change the default: 

>>> default_units = 'm' 

>>> def report_length(value, units=default_units): 

... return 'The length is { : .2 f} {}'.format(value, units) 


>>> report_length(10.1) 

'The length is 10.10 m' 

>>> default_units = 'cubits' 

>>> report_length(10.1) 

'The length is 10.10 m' 

The default units used by the function report_length are unchanged by the reassign- 
ment of the variable name def ault_units: the default value is set to the string object 
referred to by def ault_units when the def statement is encountered by the Python 
compiler (' m' ) and cannot be changed subsequently. 


2.7.3 Scope 

A function can deline and use its own variables. When it does so, those variables are 
local to that function: they are not available outside the function. Conversely, variables 
assigned outside ali function def s are global and are available everywhere within the 
program file. For example, 

>>> def fune(): 
a = 5 

... print (a,b) 


>>> b = 6 
>>> func() 

5 6 

The function fune delines a variable a, but prints out both a and b. Because the variable 
b isn’t defined in the local scope of the function, Python looks in the global scope, where 
it finds b = 6, so that is what is printed. It doesn’t matter that b hasn’t been defined 
when the function is defined , but of course it must be before the function is called. 

What happens if a function delines a variable with the same name as a global variable? 
In this case, within the function the local scope is searched first when resolving variable 
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names, so it is the object pointed to by the local variable name that is retrieved. For 
example, 

>>> def fune(): 

a = 5 

... print (a) 

>>> a = 6 
>>> func() 

5 

>>> print (a) 

6 


Note that the local variable a exists only within the body of the function; it just 
happens to have the same name as the global variable a. It disappears after the function 
exits and it doesn’t overwrite the global a. 

Python’s rules for resolving scope can be summarized as “LEGB”: first local scope, 
then enclosing scope (for nested functions), then global scope, and finally built-ins -if 
you happen to give a variable the same name as a built-in function (such as range or 
len), then that name resolves to your variable (in local or global scope) and not to the 
original built-in. It is therefore generally not a good idea to name your variables after 
built-ins. 

0 The global and nonlocal keywords 

We have seen that it is possible to access variables defined in scopes other than the local 
function’s. Is it possible to modify them (“ rebind ” them to new objects)? Consider the 
distinction between the behavior of the following functions: 

>>> def funcl(): 

... print(x) # OK, providing x is defined in global or enclosing scope 

>>> def func2(): 

... x += 1 # Not OK: can't modify x if it isn't local 

>>> x = 4 
>>> funcl() 

4 

>>> func2() 

UnboundLocalError: local variable 'x' referenced before assignment 

If you really do want to change variables that are defined outside the local scope, you 
must first declare within the function body that this is your intention with the keywords 
global (for variables in global scope) and nonlocal (for variables in enclosing scope, 
for example, where a function is defined within another). In the previous case: 

>>> def func2(): 

... global x 

... x += 1 # OK now - Python knows we mean x in global scope 

>>> x = 4 

>>> func2() # No error 

>>> x 

5 
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The function fune2 really has changed the value of the variable x in global scope. 

You should think carefully whether it is really necessary to use this technique (would 
it be better to pass x as an argument and return its updated value from the function?), 
Especially in longer programs, variable names in one scope that change value (or even 
type!) within functions lead to confusing code, behavior that is hard to predict and tricky 
bugs. 


Example E2.28 Take a moment to study the following code and predict the resuit 
before running it. 

Listing 2.5 Python scope rules 

# eg2 - scope .py 

def outer_func(): 

def inner_func(): 
a = 9 

print('inside inner_func, a is {:d} (id={:d})'.format(a, id(a))) 
print('inside inner_func, b is {:d} (id={:d})'.format(b, id(b))) 
print('inside inner_func, len is {:d} (id={:d})'.format(len,id(len))) 

len = 2 

print ('inside outer_func, a is {:d} (id={:d})'.format(a, id(a))) 
print ('inside outer_func, b is {:d} (id={:d})'.format(b, id(b))) 
print ('inside outer_func, len is {:d} (id={:d})'.format(len,id(len))) 

inner_func() 

a, b = 6, 7 
outer_func() 

print('in global scope, a is {:d} (id={:d})'.format(a, id(a))) 
print('in global scope, b is {:d} (id={:d})'.format(b, id(b))) 
print('in global scope, len is', len, '(id={:d})'.format(id(len))) 


This program delines a function, inner_func nested inside another, outer fune. 
After these definitions, the exeeution proceeds as follows: 

1. Global variables a=6 and b=7 are initialized. 

2. outer_func is called: 

a. outer_func defines a local variable, len=2. 

b. The values of a and b are printed; they don’t exist in local scope and there 
isn’t any enclosing scope, so Python searches for and finds them in global 
scope: their values (6 and 7) are output. 

c. The value of local variable len (2) is printed. 

d. inner_func is called: 

(1) A local variable, a=9 is defined. 

(2) The value of this local variable is printed. 

(3) The value of b is printed; b doesn’t exist in local scope so Python 
looks for it in enclosing scope, that of outer_func. It isn’t found 
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there either, so Python proceeds to look in global scope where it is 
found: the value b=7 is printed. 

(4) The value of len is printed: len doesn’t exist in local scope, but it 
is in the enclosing scope since len=2 is defined in outer_func: its 
value is output. 

3. After outer_func has finished execution, the values of a and b in global scope 
are printed. 

4. The value of len is printed. This is not defined in global scope, so Python 
searches its own built-in names: len is the built-in function for determining the 
lengths of sequences. This function is itself an object and it provides a short string 
description of itself when printed. 

inside outer_func, a is 6 (id=232) 
inside outer_func, b is 7 (id=264) 
inside outer_func, len is 2 (id=104) 
inside inner_func, a is 9 (id=328) 
inside inner_func, b is 7 (id=264) 
inside inner_func, len is 2 (id=104) 
in global scope, a is 6 (id=232) 

in global scope, len is <built-in function len> (id=977) 

Note that in this example outer_func has (perhaps unwisely) redelined 
(, re-bound ) the name len to the integer object 2. This means that the original len 
built-in function is not available within this function (and neither is it available within 
the enclosed function, inner_func). 


27.4 0 Passing arguments to functions 

A common question from new users of Python who come to it with a knowledge of other 
computer languages is, are arguments to functions passed “by value” or “by reference?” 
In other words, does the function make its own copy of the argument, leaving the 
caller’s copy unchanged, or does it receive a “pointer” to the location in memory of the 
argument, the contents of which the function can change? The distinction is important 
for languages such as C, but does not fit well into the Python name-object model. Python 
function arguments are sometimes (not very helpfully) said to be “references, passed by 
value.” Recall that everything in Python is an object, and the same object may have 
multiple identifiers (what we have been loosely calling “variables” up until now). When 
a name is passed to a function, the “value” that is passed is, in fact, the it points to. 
Whether the function can change the object or not (from the point of view of the caller) 
depends on whether the object is mutable or immutable. 

A couple of examples should make this clearer. A simple function, funcl, taking an 
integer argument, receives a reference to that integer object, to which it attaches a local 
name (which may or may not be the same as the global name). The function cannot 
change the integer object (which is immutable), so any reassignment of the local name 
simply points to a new object: the global name stili points to the original integer object. 
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>>> def funcl(a): 

... print('funcl: a = {}, id = {}' .format(a, id(a))) 

... a = 7 # reassigns local a to the integer 7 

... print('funcl: a = {}, id = {}' .format(a, id(a))) 


>>> a = 3 

>>> print('global: a = {}, id = {}'.format(a, id(a))) 
global: a = 3, id = 4297242592 
>>> funcl(a) 

funcl: a = 3, id = 4297242592 
funcl: a = 1 , id = 4297242720 

>>> print('global: a = {}, id = {}'.format(a, id(a))) 
global: a = 3, id = 4297242592 

funcl therefore prints 3 (inside the function, a is initially the local name for the original 
integer object); it then prints 7 (this local name now points to a new integer object, with a 
new id) - see Figure 2.5. After it returns, the global name a stili points to the original 3. 

Now consider passing a mutable object, such as a list to a function, fune2. This 
time, an assignment to the list changes the original object, and these changes persist 
after the function call. 

>>> def func2(b): 

... print('func2: b = {}, id = {}' . format(b, id(b))) 

... b.append(7) # add an item to the list 

... print('func2: b = {}, id = {}' .format(b, id(b))) 

>>> c = [1, 2, 3] 

>>> print('global: c = {}, id = {}'.format(c, id(c))) 
global: c = [1, 2, 3], id = 4361122448 
>>> func2(c) 

func2: b = [1, 2 , 3], id = 4361122448 
func2: b = [1, 2 , 2 , 7], id = 4361122448 


(a) 


(b) 


global a 
local a 


global a 
local a 



Figure 2.5 Immutable objects. Within funcl: (a) before reassigning the local variable a and (b) 
after reassigning the value of local variable a. 
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(a) 


(b) 


global 

local 


global 

local 



Figure 2.6 Mutable objects. Within fune2: (a) before appending to the list pointed to by both 
global variable c and local variable b and (b) after appending to the list with b . append (7). 


>>> print('global: c = {}, id = {}'.format(c, id(c))) 
global: c = [1, 2, 3, 7], id = 4361122448 

Note that it doesn’t matter what name is given to the list by the function: this name 
points to the same object, as you can see from its id. The relationships between the 
variable names and objects is illustrated in Figure 2.6. 

So are Python arguments passed by value or by reference? The best answer is proba- 
bly that arguments are passed by value, but that value is a reference to an object (which 
can be mutable or immutable). 


Example E2.29 The Lazy Caterer’s Sequence,f(n), describes the maximum number of 
pieces a circular pizza can be divided into with an increasing number of cuts, n. Clearly 
/(0) = 1,/(1) = 2 and/(2) = 4). For n = 3,/(3) = 7 (the maximum number of 
pieces are formed if the cuts do not intesect at a common point). It can be shown that 
the general recursion formula, 

f(n) =f(n - 1) + n, 

applies. Although there is a closed form for this sequcnce, f(n) = j (n 2 + n + 2), we 
could also define a function to grow a list of consecutive values in the sequence: 

>>> def f(seq): 

... seq.append(seq[-1] + n) 

>>> seq = [1] # f(0) = 1 

>>> for n in range(l,16) : 
f (seq) 

>>> print (seq) 

[1, 2, 4, 7, 11, 16, 22, 29, 37, 46, 56, 67, 79, 92, 106, 121] 

The list seq is mutable and so grows in place each time the function f () is called. 
The n referred to within this function is the name found in global scope (the for loop 
counter). 
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2.7.5 Recursive functions 

A function that can call itself is called a recursive function. Recursion is not always 
necessary but can lead to elegant algorithms in some situations. 32 For example, one 
way to calculate the factorial of an integer n > 1 is to deline the following recursive 
function: 

>>> def factorial(n): 
if n == 1: 

... return 1 

... return n * factorial(n-1) 

>>> factorial(5) 

120 

Here, a call to factorial (n) returns n times whatever is retumed by the call to 
factorial (n-1) , which returns n — 1 times the returned values of factorial (n-2) 
and so on until factorial (1) which is 1 by definition. That is, the algorithm makes 
use of the fact that n\ = n ■ {n — 1)! Care should be taken in implementing such recursive 
algorithms to ensure that they stop when some condition is met. 33 


Example E2.30 The famous Tower of Hcinoi problem involves three poles, one of 
which (pole A) is stacked with n differently sized, circular discs, in decreasing order of 
diameter with the largest at the bottom. The task is to move the stack to the third pole 
(pole C) by moving one disc at a time in such a way that a larger disc is never placed on 
a smaller one. It is necessary to use the second pole (pole B) as an intermediate resting 
place for the discs. 

The problem can be solved using the following recursive algorithm. Label the discs 
Dj with D i the smallest disc and D n the largest. 

• Move discs D \, Dj, ■ • • , D n -\ from A to B; 

• Move disc D n from A to C; 

• Move discs D \, TF, • • • , D n -\ from B to C. 

The second step is a single move, but the first and last require the movement of a stack 
of n — 1 discs from one peg to another - which is exactly what the algorithm itself 
solves! 

In the following code, we identify the discs by the integers 1,2,3, ■ • ■ stored in one of 
three lists, A, B and c. The initial state of the system, with all discs on pole A is denoted 
by, for example, A = [ 5 , 4 , 3 , 2,11 where the first indexed item is the “bottom” of the 
pole and the last indexed item is the “top.” The rules of the problem require that these 
lists must always be decreasing sequences. 


32 In fact, because of the overhead involved in making a function call, a recursive algorithm can be expected 
to be slower than a well-designed iterative one. 

33 In practice, an infinite loop is not possible because of the mernory overhead involved in each function call, 
and Python sets a maximum recursion limit. 
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Listing 2.6 The Tower of Hanoi problem 

# eg2-hanoi.py 

def hanoi(n, Pl, P2, P3): 

""" Move n discs from pole Pl to pole P3. """ 
if n == 0: 

# No more discs to move in this step 

return 

global count 
count += 1 

# move n-1 discs from Pl to P2 
hanoi(n-1, Pl, P3, P2) 

if Pl: 

# move disc from Pl to P3 
P3.append(Pl.pop()) 
print (A, B, C) 

# move n-1 discs from P2 to P3 
hanoi(n-1, P2, Pl, P3) 

# Initialize the poles: all n discs are on pole A. 
n = 3 

A = list(range(n,0,-1)) 

B, C = [] , [] 

print (A, B, C) 
count = 0 
hanoi (n, A, B, C) 
print(count) 


Note that the hanoi function just moves a stack of discs from one pole to another: 
lists (representing the poles) are passed into it in some order, and it moves the discs 
from the pole represented by the first list, known locally as Pl, to that represented by 
the third (P3). It does not need to know which list is A, B or c. 


2.7.6 Exercises 
Questions 


Q2.7.1 The following small programs each attempt to output the simple sum: 


56 

+44 


100 


Which two programs work as intended? Explain carefully what is wrong with each of 
the others. 
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ct. def line () : 


my_sum = '\n'. join([' 56', ' +44', line(), ' 100', line()]) 

print (my_sum) 

b. def 1 ine(): 

return ' -' 

my_sum = '\n'.join([' 56', ' +44', line(), ' 100', line()]) 

print (my_sum) 

C. def line(): 

return ' -' 

my_sum = '\n'. join([' 56', ' +44', line, ' 100', line]) 

print (my_sum) 

d. def line(): 

print ('-' ) 

print (' 56 ' ) 

print (' +44') 

print (line) 
print(' 100') 

print (line) 

C. def line(): 

print ('-' ) 

print(' 56 ' ) 

print(' +44 ') 

print (line ()) 
print(' 100') 

print (line ()) 

f. def line () : 

print ('-' ) 


print(' 

56 

print(' 

+44 

line() 


print(' 

100 

line() 



Q2.7.2 The following code snippet attempts to calculate the balance of a savings 
account with an annual interest rate of 5% after 4 years, if it starts with a balance of 
$ 100 . 


>>> balance = 100 

>>> def add_interest(balance, rate): 

balance += balance * rate / 100 

>>> for year in range(4): 

... add_interest(balance, 5) 
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... print ('Balance after year {}: ${:.2f}'.format(year+1, balance)) 

Balance after year 1: $100.00 

Balance after year 2: $100.00 

Balance after year 3: $100.00 

Balance after year 4: $100.00 

Explain why this doesn’t work and then provide a working alternative. 

Q2.7.3 A Harshad number is an integer that is divisible by the sum of its digits (e.g., 
21 is divisible by 2 + 1 =3 and so is a Harshad number). Correct the following code 
which should return True or False if n is a Harshad number or not respectively: 

def digit_sum(n): 

""" Find the sum of the digits of integer n. """ 

s_digits = list(str(n)) 
dsum = 0 

for s_digit in s_digits: 
dsum += int(s_digit) 

def is_harshad(n): 

return not n % digit_sum(n) 

When run, the function is harshad raises an error: 

> > > is_harshad(21) 

TypeError: unsupported operand type(s) for %: 'int' and 'NoneType' 


Problems 

P2.7.1 The word game Scrabble is played on a 15 x 15 grid of squares referred to by 
a row index letter (A - O) and a column index number (1 - 15). Write a function to 
determine whether a word will fit in the grid, given the position of its first letter as a 
string (e.g., 'G7') a variable indicating whether the word is placed to read across or 
down the grid and the word itself. 

P2.7.2 Write a program to find the smallest positive integer, n, whose factorial is not 
divisible by the sum of its digits. For example, 6 is not such a number because 6! = 720 
and 7 + 2 + 0 = 9 divides 720. 

P2.7.3 Write two functions which, given two lists of length 3 representing three- 
dimensional vectors a and b, calculate the dot product, a • b and the vector (cross) 
product, a x b. 

Write two more functions to return the scalar triple product, a ■ (b x c) and the vector 
triple product, a x (b x c). 

P2.7.4 A right regular pyramid with height h and a base consisting of a regular 
/7-sidcd polygon of side length 5 has a volume, V = \Ah and total surface area, 
S = A + jnsl where A is the base area and / the slant height, which may be calculated 
from the apothem of the base polygon, a = j ,v cot ^ as A = \nsa and / = y/h 2 + a 1 . 
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Use these formulas to define a function, pyramid _av, returning V and S when passed 
values for n, s and h. 


P2.7.5 The range of a projectile launched at an angle a and speed v on flat terrain is 


R = 


v 2 sin 2a 
8 


where g is the acceleration due to gravity which may be taken to be 9.81 ms 2 for 
Earth. The maximum height attained by the projectile is given by 


H = 


2 • 

v sin - a 


2g 


(We neglect air resistance and the curvature and rotation of the Earth.) Write a function 
to calculate and return the range and maximum height of a projectile, taking a and v as 
arguments. Test it with the values v = 10 m s~ 1 and a = 30°. 


P2.7.6 Write a function, sinm_cosn, which returns the value of the following definite 
integral for integers m, n > 1. 


I 


7t/2 


sin" e cos'" 0 <10 = 


(/»—!)!!(/?—1)!! TT 
(;»+«)!! 2 
(m—!)!!(«—1)!! 

( m+n)\\ 


m, n both even, 
otherwise. 


Hint: for calculating the double factorial, see Exercise P2.4.6. 


P2.7.7 Write a function that determines if a string is a palindrome (that is, reads the 
same backward as forward) using recursion. 


P2.7.8 Tetration may be thought of as the next operator after exponentiation: Thus, 
where x x n can be written as the sum x + x + x+ -- -+ x with n terms, and x n is the 
multiplication of n factors: x ■ x ■ x ■ ■ ■ x, the expression written n x is equal to the repeated 
exponentiation involving n occurrences of x: 

n x = x*‘ 

For example, 4 2 = 2 2 “ = 2 24 = 2 16 = 65536. Note that the exponential “tower” is 
evaluated frorn top to bottom. 

Write a recursive function to calculate n x and test it (for small, positive real values of 
x and non-negative integers n : tetration generates very large numbers)! 

How many digits are there in 3 5? In 5 2? 
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3 Interlude: simple plotting 
with pylab 


As Python has grown in popularity, many libraries of packages and modules have 
become available to extend its functionality in useful ways; Matplotlib is one such 
library. Matplotlib provides a means of producing graphical plots that can be embedded 
into applications, displayed on the screen or output as high-quality image files for 
publication. 

Matplotlib has a fully fledged object-oriented interface, which is described in more 
detail in Chapter 7, but for simple plotting in an interactive shell session, its simpler, 
procedural pylab interface provides a convenient way of visualizing data, pylab is 
designed to be easy to learn and functions in a similar way to comparable tools in the 
commercial MATLAB package. 

On a system with Matplotlib installed the pylab package is imported with 

>>> import pylab 

even though this means prefacing ali of the pylab method calls with “pylab A 1 


3.1 Basic plotting 

3.1.1 Line plots and scatterplots 

The simplest {x,y) line plot is achieved by calling pylab.plot with two iterable 
objects of the same length (typically lists of numbers or NumPy arrays). For example, 

>>> ax = [0., 0.5, 1.0, 1.5, 2.0, 2.5, 3.0] 

>>> ay = [0.0, 0.25, 1.0, 2.25, 4.0, 6.25, 9.0] 

>>> pylab.plot(ax,ay) 

>>> pylab.show() 

pylab. plot creates a matplotlib object (here, a Line2D object) and pylab. show () 
displays it on the screen. Figure 3.1 shows the resuit; by default the line will be in blue. 

To plot (x, y) points as a scatterplot rather than as a line plot, call pylab. scatter 
instead: 

>>> import random 
>>> ax, ay = [] , [] 


1 It is better to avoid polluting the global namespace by importing as f rom pylab import *. 

84 
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Figure 3.1 A basic ( x, y ) line plot. 


1.2 

1.0 
0.8 - 
0.6 ■ 
0.4 
0.2 

0.0 

-0.2 
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-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 


Figure 3.2 A basic scatter plot. 


>>> for i in range(lOO): 

... ax.append(random.random()) 

... ay.append(random.random()) 

>>> pylab.scatter(ax,ay) 

>>> pylab.show() 

The resulting plot is shown in Figure 3.2. 

We can also save the plot as an image by calling pylab. savef ig ( filename ) . The 
desired image format is deduced from the filename extension. For example, 

pylab.savefig('plot.png') # save as a PNG image 

pylab.savefig('plot.pdf') # save as PDF 

pylab.savefig('plot.eps') # save in Encapsulated PostScript format 
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Interlude: simple plotting with pylab 


Example E3.1 As an example, let’s plot the function y = sin 2 x for —2 n < x < 2 tt. 
Using only the Python we’ve covered in the previous chapter, here is one approach: 

We calculate and plot 1,000 (x, y) points, and store them in the lists ax and ay. To 
set up the ax list as the abcissa, we can’t use range directly because that method only 
produces integer sequences, so first we work out the spacing between each x value as 


■Anax Anin 


(if our n values are to include x mm and x raax , there are n — 1 intervals of width Ax); the 
abcissa points are then 


x,- = x m i n + / Ax for i = 0,1,2, • • • , n — 1. 
The corresponding y-axis points are 


y t = sin 2 (x,). 

The following program implements this approach, and plots the (x, y) points on a simple 
line-graph (see Figure 3.3). 

Listing 3.1 Plotting y = sin 2 x 

# eg3-sin2x.py 

import math 
import pylab 

xmin, xmax = -2. * math.pi, 2. * math.pi 

n = 1000 

x = [0. ] * n 

y = [o. ] * n 

dx = (xmax - xmin)/(n-1) 

for i in range(n): 

xpt = xmin + i * dx 
x[i] = xpt 

y[i] = math.sin(xpt)**2 

pylab.plot(x,y) 
pylab.show() 


3.1.2 linspace and vectorization 

Plotting the simple function y = sin 2 x in the previous example involved quite a lot of 
work, almost ali of it to do with setting up the lists x and y. In fact, pylab provides 
some of the same functionality as the NumPy library introduced in Chapter 6, which 
can be used to make life much easier. 
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Figure 3.3 A plot of y = sin 2 x. 


First, the regularly spaced grid of x-coordinates, x, can be created using linspace. 
This is much like a floating point version of the range built-in: it takes a start value, 
an end value, and the number of values in the sequence and generates an array of 
values representing the arithmetic progression between (and inclusive of) the two 
values. For example, x = pylab. linspace ( — 5, 5, 1001) creates the sequence: 
-5.0, -4.99, -4.98,..., 4.99,5.0. 

Second, the pylab equivalents of the math module’s methods can act on iterable 
objects (such as lists or NumPy arrays). Thus, y = pylab. sin (x) creates a 
sequence of values (actually, a NumPy ndarray), which are sin(x,) for each value 
Xj in the array x: 

import pylab 
n = 1000 

xmin, xmax = -2. * math.pi, 2. * math.pi 
x = pylab.linspace(xmin, xmax, n) 
y = pylab.sin(x)**2 
pylab.plot(x,y) 
pylab.show() 

This is called vectorization and is described in more detail in Section 6.1.3. Lists 
and tuples can be turned into array objects supporting vectorization with the array 
constructor method: 

>>> w = [1.0, 2.0, 3.0, 4.0] 

>>> w = pylab.array(w) 

>>> w * 100 # multiply each element by 100 

array([ 100., 200., 300., 400.]) 

To add a second line to the plot, simply call pylab. plot again: 


x = pylab.linspace(xmin, xmax, n) 
yl = pylab.sin(x)**2 
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Figure 3.4 A plot of y = sinc(x). 


y2 = pylab.cos(x)**2 
pylab.plot(x,yl) 
pylab.plot(x,y2) 
pylab.Show() 

Note that after a plot has been displayed with show or saved with savef ig, it is no 
longer available to display a second time - to do this it is necessary to call pylab. plot 
again. This is because of the procedural nature of the pylab interface: each call to a 
pylab method changes the intemal state of the plot object. The plot object is built up 
by successive calls to such methods (adding lines, legends and labeis, setting the axis 
limits, etc.), and then the plot object is displayed or saved. 


Example E3.2 The sine function is the function 

sinx 
x 

To plot it over 20 < x < 20: 

>>> x = pylab.linspace(-20, 20, 1001) 

>>> y = pylab.sin(x)/x 

_main_:1: RuntimeWarning: invalid value encountered in true_divide 

>>> pylab.plot(x,y) 

>>> pylab.show() 

Note that even though Python wams of the division by zero at x = 0, the function is 
plotted correctly: the singular point is set to the special value nan (standing for “not a 
number”) and is omitted from the plot (Figure 3.4). 

>» y [498 : 503] 

array([ 0.99893367, 0.99973335, nan, 0.99973335, 0.99893367]) 
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3.1.3 Exercises 
Problems 

P3.1.1 Plot the functions 

j\(x) = In (—and 

f 2 (x) = ln • 

\ sin- x / 

on 1,000 points across the range —20 < x < 20. What happens to these functions at 
x = nn/2 (n = 0, ±1, ±2,...)? What happens in your plot of them? 

P3.1.2 The Michaelis-Menten equation models the kinetics of enzymatic reactions as 

= d[P] = V max IS | 

d t K, n + [S]’ 

where v is the rate of the reaction converting the substrate, S, to product P, catalyzed 
by the enzyme. V max is the maximum rate (when all the enzyme is bound to S) and the 
Michaelis constant, K m , is the substrate concentration at which the reaction rate is at 
half its maximum value. 

Plot v against [S] for a reaction with K m = 0.04 M and V max = 0.1 Ms -1 . Look 
ahead to the next section if you want to label the axes. 

P3.1.3 The normalized Gaussian function centered at x = 0 is 

sW = ^ exp (^)' 

Plot and compare the shapes of these functions for Standard deviations a = 1,1.5 and 2. 

3.2 Labeis, legends and customization 

3.2.1 Labeis and legends 

Plot legend 

Each line on a pylab plot can be given a label by passing a string object to its 
label argument. However, the label won’t appear on the plot unless you also call 
pylab. legend to add a legend: 

pylab.plot(ax, ayl, label= ' sin A 2(x )') 
pylab.legend() 
pylab.show() 

The location of the legend is, by default, the top right-hand comer of the plot but can 
be customized by setting the loc argument to the legend method to either the string 
or integer values given in Table 3.1. 
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Table 3.1 Legend location specifiers 


String 

Integer 

'best' 

0 

'upper right' 

1 

'upper left' 

2 

'lower left' 

3 

'lower right' 

4 

'right' 

5 

'center left' 

6 

'center right' 

7 

'lower center' 

8 

'upper center' 

9 

'center' 

10 


The plot title axis labeis 

A plot can be given a title above the axes by calling pylab.title and passing the 
title as a string. Similarly, the methods pylab. xlabel and pylab. ylabel control the 
labeling of the x- and y-axes: just pass the label you want as a string to these methods. 
The optional additional attribute f ontsize sets the font size in points. For example, 

t = pylab.linspace(0., 0.1, 1000) 

Vp_uk, Vp_us = 230, 110 

f_uk, f_us = 50, 60 

O V_uk = Vp_uk * pylab.sin(2 * pylab.pi * f_uk * t) 

V_us = Vp_us * pylab.sin(2 * pylab.pi * f_us * t) 

© pylab.plot(t*1000, V_uk, label='UK') 
pylab.plot(t*1000, V_us, label='US') 

pylab.title('A comparison of AC voltages in the UK and US') 

pylab.xlabel('Time /ms', fontsize=16.) 

pylab.ylabel('Voltage /V', fontsize=16.) 

pylab.legend() 

pylab.show() 


O We calculate the voltage as a function of time (t, in seconds) in the United King- 
dom and in the United States, which have different peak voltages (230 V and 110 V 
respectively) and different frequencies (50 Hz and 60 Hz). 

© The time is plotted on the x-axis in mi lliseconds (t*looo) - see Figure 3.5. 


Using UTjrX in pylab 

You can use UTpX markup in pylab plots, but this option needs to be enabled in 
Matplotlib’s “rc settings,” as follows: 

pylab.rc('text', usetex=True) 

Then simply pass the UTpX markup as a string to any label you want displayed in 
this way. Remember to use raw strings (r' xxx') to prevent Python from escaping any 
characters followed by UTpX’s backslashes (see Section 2.3.2). 
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Figure 3.5 A comparison of AC voltages in the United Kingdom and United States. 


Example E3.3 To plot the functions/,,(x) = x n sinx for n = 1,2,3,4: 

import pylab 

pylab.rc('text ', usetex=True) 


x = pylab.linspace(-10,10,1001) 
for n in range(l,5): 

y = x**n * pylab.sin(x) 

O y /= max(y) 

pylab.plot(x,y, label=r'$x A {}\sin x$'.format(n)) 
pylab.legend(loc='lower center') 
pylab.show() 

O To make the graphs easier to compare, they have been scaled to a maximum of 1 in 
the region considered. 

The graph produced is given in Figure 3.6. 


3.2.2 Customizing plots 

Markers 

By default, plot produces a line-graph with no markers at the plotted points. To add 
a marker on each point of the plotted data, use the marker argument. Several different 
markers are available and are documented Online; 2 some of the more useful ones are 
listed in Table 3.2. 

Colors 

The color of a plotted line and/or its markers can be set with the color argument. 
Several formats for specifying the color are supported. First, there are one-letter codes 


2 http://matplotlib.org/api/markers_api.htmlmodule-matplotlib.markers. 
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Table 3.2 Some Matplotlib marker styles 


Code 

Marker 

Description 



point 

O 

O 

circle 

+ 

+ 

plus 

X 

X 

X 

D 

0 

diamond 

V 

V 

downward triangle 

A 

A 

upward triangle 

s 

□ 

square 

* 

★ 

star 


Table 3.3 

Matplotlib color code letters 

Code 

Color 

b 

blue 

9 

green 

r 

red 

c 

cyan 

m 

magenta 

y 

yellow 

k 

black 

W 

white 



Figure 3.6 f n (x) — x 2 sinx for n = 1,2,3,4. 


for some common colors, given in Table 3.3. For example, color=' r' specifies a red 
line and markers. The default color sequence for a series of lines on the same plot is in 
the same order as this table. 

Alternatively, shades of gray can specified as a string representing a f loat in the 
range 0-1 (' 0.' being black and ' l.' being white). HTML hex strings giving the 
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Table 3.4 Matplotlib line styles 


Code 

Line style 

- 

solid 


dashed 


dotted 


dash-dot 


io 
9 
8 
7 
6 
5 
4 
3 
2 
1 

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 

Figure 3.7 Two different line styles on the same plot. 



red, green and blue (RGB) components of the color in the range oo - ff can also 
be passed in the color argument (e.g., color=' #ffooff ' is magenta). Finally, the 
RGB components can also be passed as a tuple of three values in the range 0-1 (e.g., 
color= (0.5, 0., o .) is a dark red color). 

Line styles and widths 

The default plot line style is a solid line of weight 1.0 pt. To customize this, set the 
linestyle argument (also a string). Some of the possible line style settings are given 
in Table 3.4. 

To draw no line at ali, set linestyle= ' ' (the empty string). The thickness of a line 
can be specified in points by passing a f loat to the linewidth attribute. 

For example, 


ax = pylab.linspace(0.1, 1., 100) 
ayi = 1./ax 

aye =10. * pylab.exp(-2.*ax) 

pylab.plot(ax, ayi, color='r', linestyle=':', linewidth=4.) 
pylab.plot(ax, aye, color='m', linestyle='--', linewidth=2.) 
pylab.show() 


This code produces Figure 3.7. 
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Figure 3.8 A plot produced with explicitly defined data limits. 

The following abbreviations for the plot line properties are also valid: 

• c for color 

• ls for linestyle 

• lw for linewidth 

For example, 

pylab.plot(x, y, c='g', ls='--', lw=2) # a thick, green, dashed line 

It is also possible to specify the color, linestyle and marker style in a single string: 

pylab.plot(x, y, 'r:*') # a red, dotted line with triangle markers 

Finally, multiple lines can be plotted using a sequence of x, y, format arguments: 

pylab.plot(x,yl, 'r--', x, y2, 'k-.') 

plots a red dashed line for (x,yl) and a black dash-dot line for (x,y2). 

Plot limits 

The methods pylab. xlim and pylab. ylim set the x- and y- limits of the plot respec- 
tively. They must be called after any pylab. plot statements, before showing or saving 
the figure. For example, the following code produces a plot of the provided data series 
between chosen limits (Figure 3.8): 

t = pylab.linspace(0, 2, 1000) 

f = t * pylab.exp(t + pylab.sin(2 0*t) ) 

pylab.plot(t, f) 

pylab.xlim(1.5,1.8) 

pylab.ylim(0,30) 

pylab.show() 


Example E3.4 Moore’s Law is the observation that the number of transistors on CPUs 
approximately doubles every two years. The following program illustrates this with 
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a comparison between the actual number of transistors on high-end CPUs from 
between 1972 and 2012, and that predicted by Moore’s Law which may be stated 
mathematically as: 

rij = no2 (;v,_;v ’ 0 ' ) / 7 ' 2 , 

where hq is the number of transistors in some reference year, y$, and 77 = 2 is the 
number of years taken to double this number. Because the data cover 40 years, the 
values of «, span many orders of magnitude, and it is convenient to apply Moore’s Law 
to its logarithm, which shows a linear dependence on y: 

logio n i = lo gio «o + ~ }0 log 10 2. 


Listing 3.2 An illustration of Moore’s Law 

# eg3-moore . py 
import pylab 

# The data - lists of years: 

year = [1972, 1974, 1978, 1982, 1985, 1989, 1993, 1997, 1999, 2000, 2003, 
2004, 2007, 2008, 2012] 

# and number of transistors (ntrans) on CPUs in millions: 

ntrans = [0.0025, 0.005, 0.029, 0.12, 0.275, 1.18, 3.1, 7.5, 24.0, 42.0, 
220.0, 592.0, 1720.0, 2046.0, 3100.0] 

# tum the ntrans list into a pylab array and multiply by 1 million 
ntrans = pylab.array(ntrans) * l.e6 

yO, nO = year[0], ntrans[0] 

# A linear array of years spanning the data's years 
y = pylab.linspace(yO, year[-l], year[-l] - yO + 1) 

# Time taken in years for the number of transistors to double 
T2 = 2. 

moore = pylab.loglO(nO) + (y - yO) / T2 * pylab.loglO(2) 

pylab.plot(year, pylab.loglO(ntrans), '*', markersize=12, color='r', 
markeredgecolor='r', label='observed') 
pylab.plot(y, moore, linewidth=2, color= , k , / linestyle='-- ', 
label='predicted') 

pylab.legend(fontsize=16, loc='upper left') 

pylab.xlabel('Year', fontsize=16) 

pylab.ylabel('log(ntrans)', fontsize=16) 

pylab.title("Moore's Law") 

pylab.show() 


In this example, the data are given in two lists of equal length representing the year 
and representative number of transistors on a CPU in that year. The Moore’s Law 
formula above is implemented in logarithmic form, using an array of years spanning 
the provided data. (Actually, since on a logarithmic scale this will be a straight line, 
really only two points are needed.) 

For the plot, shown in Figure 3.9, the data are plotted as largeish stars and the Moore’s 
Law prediction as a thick black line. 
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Figure 3.9 Moore’s Law. 


3.2.3 Exercises 
Problems 

P3.2.1 A molecule, A, reacts to form either B or C with first-order rate constants k\ 
and kj respectively. That is, 

d[A] 

-Li = —(ki + ^ 2 )[A], 
d t 

and so 


[A] = [A] 0 e- ( * 1+ * 2) ', 

where [A]o is the initial concentration of A. The product concentrations (starting 
from 0), increase in the ratio [B]/[C] = k] /kn and conservation of matter requires 
[B] + [C] = [A] 0 - [A]. Therefore, 

[B1 = LTfc [A| »( 1 - e ^ + ‘ 2 ”) 

For a reaction with k\ = 300 s~ 1 and kn = 100 s~ 1 , plot the concentrations of A, B 
and C against time given an initial concentration of reactant, | A | () = 2.0 moldm . 

P3.2.2 A Gaussian integer is a complex number whose real and imaginary parts are 
both integers. A Gaussian prime is a Gaussian integer x + iy such that either: 
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• one of x and y is zero and the other is a prime number of the form 4« + 3 or 
— (4/7 + 3) for sorne integer n > 0; or 

• both x and y are nonzero and x 2 + y 2 is prime. 

Consider the sequence of Gaussian integers traced out by an imaginary particle, 
initially at cq, moving in the complex plane according to the following rule: it takes 
integer steps in its current direction (±1 in either the real or imaginary direction), but 
turns left if it encounters a Gaussian prime. Its initial direction is in the positive real 
direction (Ac = 1 + 0/ =>■ Ax = 1, Ay = 0). The path traced out by the particle is 
called a Gaussian prime spiral. 

Write a program to plot the Gaussian prime spiral starting at cq = 5 + 23 i. 

P3.2.3 The annual risk of death (given as “1 in N”) for men and women in the UK in 
2005 for different age ranges is given in the table below. Use pylab to plot these data 
on a single chart. 


Age range 

Female 

Male 

< 1 

227 

177 

1-4 

5376 

4386 

5-14 

10417 

8333 

15-24 

4132 

1908 

25-34 

2488 

1215 

35-44 

1106 

663 

45-54 

421 

279 

55-64 

178 

112 

65-74 

65 

42 

75-84 

21 

15 

>84 

7 

6 


3.3 More advanced plotting 

3.3.1 Polar plots 

pylab.plot produces a plot on Cartesian (x, y) axes. To produce a polar plot using 
(r,6) coordinates, use pylab.polar, which is passed arguments theta (which is 
usually the independent variable) and r. 


Example E3.5 A cardioid is the plane figure described in polar coordinates by 
r = 2a(\ + cos 9) for 0 < 0 < 2rc: 

theta = pylab.linspace(0, 2.*pylab.pi, 1000) 
a = 1. 

r = 2 ★ a * (1. + pylab.cos(theta)) 
pylab.polar(theta, r) 
pylab.show() 

The polar graph plotted by this code is illustrated in Figure 3.10. 
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Figure 3.10 The cardioid figure formed with a — 1 . 


3.3.2 Histograms 

A liistogram represents the distribution of data as a series of (usually vertical) bars with 
lengths in proportion to the number of data items falling into predefined ranges (known 
as bins). That is, the range of data values is divided into intervals and the histogram 
constructed by counting the number of data values in each interval. 

The pylab function hi st produces a histogram from a sequence of data values. The 
number of bins can be passed as an optional argument, bins; its default value is 10. 
Also by default the height of the histogram bars are absolute counts of the data in the 
corresponding bin; setting the attribute normed=True normalizes the histogram so that 
its area (the height times width of each bar summed over the total number of bars) is 
unity. 

For example, take 5,000 random values from the normal distribution with mean 0 and 
Standard deviation 2 (see Section 4.5.1): 

>>> import pylab 
>>> import random 
>>> data = [] 

>>> for i in range(5000): 

... data.append(random.normalvariate(0, 2)) 

>>> pylab.hist(data, bins=20, normed=True) 

>>> pylab.show() 


The resulting historgram is plotted in Figure 3.11. 


3.3.3 Multiple axes 

The command pylab. twinx () starts a new set of axes with the same x-axis as the 
original one, but a new y-scale. This is useful for plotting two or more data series, 
which share an abcissa (x-axis) but with y values which differ widely in magnitude or 
which have different units. This is illustrated in the following example. 
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Figure 3.11 A histogram of random, normally distributed data. 


Example E3.6 As described at http://tylervigen.com/, there is a curious but utterly 
meaningless correlation over time between the divorce rate in the US state of Maine 
and the per capita consumption of margarine in that country. The two time series here 
have different units and meanings and so should be plotted on separate v-axes, sharing 
a common x-axis (year). 

Listing 3.3 The correlation between margarine consumption in the United States and the divorce rate in 
Maine 


# eg3-margarine-divorce.py 
import pylab 

years = range(2000, 2010) 

divorce_rate = [5.0, 4.7, 4.6, 4.4, 4.3, 4.1, 4.2, 4.2, 4.2, 4.1] 
margarine_consumption = [8.2, 7, 6.5, 5.3, 5.2, 4, 4.6, 4.5, 4.2, 3.7] 

O linei = pylab.plot(years, divorce_rate, 'b-o', 

label='Divorce rate in Maine') 
pylab.ylabel('Divorces per 1000 people') 
pylab.legend() 

pylab.twinx() 

line2 = pylab.plot(years, margarine_consumption, 'r-o', 
label='Margarine consumption') 
pylab.ylabel('lb of Margarine (per capita)') 

# Jump through some hoops to get the hoth line's labeis in the same legend: 
© lines = linei + line2 

labeis = [] 

for line in lines: 

0 labeis.append(line.get_label ()) 

pylab.legend(lines, labeis) 
pylab.show() 
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Figure 3.12 The correlation between the divorce rate in Maine and the per capita margarine 
consumption in the United States. 


We have a bit of extra work to do in order to place a legend labeled with both lines on 
the plot: O pylab. plot returns a list of objects representing the lines that are plotted, 
so we save them as linei and line2, © concatenate them, and then © loop over them 
to retrieve their labeis. The list of lines and labeis can then be passed to pylab. legend 
directly. The resuit of this code is the graph plotted in Figure 3.12. 


3.3.4 Exercises 
Problems 

P3.3.1 A spiral may be considered to be the figure described by the motion of a point 
on an imaginary line as that line pivots around an origin at constant angular velocity. If 
the point is fixed on the line, then the figure described is a circle. 

a. If the point on the rotating line moves from the origin with constant speed, its 
position describes an Archimedean spiral. In polar coordinates the equation of 
this spiral is r = a + bO. Use pylab to plot the spiral defined by a = 0, b = 2 for 
0 < 6 < 87t. 

b. If the point moves along the rotating line with a velocity that increases in pro- 
portion to its distance from the origin, the resuit is a logarithmic spiral, which 
may be written as r = a". Plot the logarithmic spiral defined by a = 0.8 for 
0 < 9 < Sn. The logarithmic spiral has the property of self-similarity: with each 
2 n whorl, the spiral grows but maintains its shape. 3 Logarithmic spirals occur 


3 The Swiss mathematician Jakob Bemoulli was so taken with this property that he coined the logarithmic 
spiral Spira mirabilis : the “miraculus sprial” and wanted one engraved on his headstone with the phrase 
“Eadem mutata resurgo” (“Although changed, I shall arise the same”). Unfortunately, an Archimedian spiral 
was engraved by mistake. 
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frequently in nature, from the arrangements of the chambers of nautilus shells to 
the shapes of galaxies. 


P3.3.2 A simple model for the interaction potential between two atoms as a function 
of their distance, r, is that of Lennard-Jones: 


U(r) = 


B 


A 


where A and B are positive constants. 4 * 

For Argon atoms, these constants may be taken to be A = 1.024 x 10 -23 J nm 6 and 
B = 1.582 x 1(T 26 J nm 12 . 


a. 


b. 


Plot U{r). On a second y-axis on the same figure, plot the interatomic force 

d U 12 B 6A 

F(r) = -— = 
dr 


r!3 


Your plot should show the “interesting” part of these curves, which tend rapidly 
to very large values at small r. 

Hint: life is easier if you divide A and B by Boltzmann’s constant, 1.381 x 
10~ 23 JKA 1 so as to measure U(r) in units of K. What is the depth, e, and 
location, ro, of the potential minimum for this system? 

For small displacements from the equilibrium interatomic separation (where 
F = 0), the potential may be approximated to the harmonic oscillator function, 
V(r) = \k(r — ro) 2 + e, where 


k = 


d 2 U 


dr 2 


1565 42A 


„14 


„8 


ro 


Plot U(r) and V(r) on the same diagram. 


P3.3.3 The seedhead of a sunflower may be modeled as follows. Number the n seeds 
s = 1,2, ■■ ■ ,n and place each seed a distance r = ^fs from the origin, rotated 0 = 
2jzs/(f) from the x axis, where 4> is some constant. The choice nature makes for 4> is the 
golden ratio, cp = (1 + y/5)/2, which maximizes the packing efficiency of the seeds as 
the seedhead grows. 

Write a Python program to plot a model sunflower seedhead. (Hint: use polar coordi- 
nates.) 


4 This was popular in the early days of computing because r 12 is easy to compute as the square of r ® 
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4 The core Python language II 


This chapter continues the introduction to the core Python language started in Chapter 2 
with a description of Python error handling with exceptions, the data structures known 
as dictionaries and sets, some convenient and efficient idioms to achieve common tasks, 
and a survey of some of the modules provided in the Python Standard library. Finally, 
we present a brief introduction to object-orientedprogramming with Python. 


4.1 Errors and exceptions 

Python distinguishes between two types of error: Syntax errors and other exceptions. 
Syntax errors are mistakes in the grammar of the language and are checked for before 
the program is executed. Exceptions are runtime errors: conditions usually caused by 
attempting an invalid operation on an item of data. The distinction is that syntax errors 
are always fatal: there is nothing the Python compiler can do for you if your program 
does not conform to the grammar of the language. Exceptions, however, are conditions 
that arise during the running of a Python program (such as division by zero) and a 
mechanism exists for “catching” them and handling the condition gracefully without 
stopping the program’s execution. 


4.1.1 Syntax errors 

Syntax errors are caught by the Python compiler and produce a message indicating 
where the error occurred. For example, 

>>> for lambda in range(8): 

File "<stdin>", line 1 

for lambda in range(8): 

SyntaxError: invalid syntax 

Because lambda is a reserved keyword, it cannot be used as a variable name. Its 
occurrence where a variable name is expected is therefore a syntax error. Similarly, 

>>> for f in range(8: 

File "<stdin>", line 1 
for f in range(8: 

SyntaxError: invalid syntax 
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The syntax error here occurs because a single argument to the range built-in must be 
given as an integer between parentheses: the colon breaks the syntax of calling functions 
and so Python complains of a syntax error. 

Because a line of Python code may be split within an open bracket (“ () ”, “ [] ”, or 
“{}”), a statement split over several lines can sometimes cause a SyntaxError to be 
indicated somewhere other than the location of the true bug. For example, 

>>> a = [1, 2, 3, 4, 

. . . b = 5 

File "<stdin>", line 4 
b = 5 

SyntaxError: invalid syntax 

Here, the statement b = 5 is syntactically valid: the error arises from failing to close 
the square bracket of the previous list declaration (the Python shell indicates that a line 
is a continuation of a previous one with the initial ellipsis (“. . . ”). 

There are two special types of SyntaxError that are worth mentioning: an 
indentationError occurs when a block of code is improperly indented and TabError 
is raised when a tabs and spaces are mixed inconsistently to provide indentation. 1 


Example E4.1 A common syntax error experienced by beginner Python programmers 
is in using the assignment operator “=” instead of the equality operator “==” in a 
conditional expression: 

>>> if a = 5: 

File "<stdin>", line 1 
if a = 5: 

SyntaxError: invalid syntax 

This assignment a = 5 does not return a value (it simply assigns the integer object 5 
to the variable name a) and so there is nothing corresponding to True or False that 
the if statement can use: hence the SyntaxError. This contrasts with the C language 
in which an assignment returns the value of the variable being assigned (and so the 
statement a = 5 evaluates to true). This behavior is the source of many hard-to- 
find bugs and security vulnerabilities and its omission from the Python language is by 
design. 


4.1.2 Exceptions 

An exception occurs when an syntactically correct expression is executed and causes a 
runtime error. There are different types of built-in exception, and custom exceptions can 
be defined by the programmer if required. If an exception is not “caught” using the try 
. . . except clause described later, Python produces a (usually helpful) error message. 
If the exception occurs within a function (which may have been called, in tum, by 


1 This error can be avoided by using only spaces to indent code. 
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another function, and so on), the message retumed takes the form of a stack traceback: 
the history of function calls leading to the error is reported so that its location in the 
program execution can be determined. 

Some built-in exceptions will be familiar from your use of Python so far. 

NameError 

>>> print( , 4z = 4*z) 


Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

NameError: name 'z' is not defined 

A NameError exception occurs when a variable name is used that hasn’t been defined: 
the print statement here is valid, but Python doesn’t know what to print for z. 

ZeroDivisionError 

>>> a, b = 0, 5 
>>> b / a 


Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

ZeroDivisionError: float division by zero 

Division by zero is not mathematically defined. 

TypeError and ValueError 

A TypeError is raised if an object of the wrong type is used in an expression or 
function. For example, 

>>> '00' + 7 


Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

TypeError: Can't convert 'int' object to str implicitly 

Python is a (fairly) strongly typed language, and it is not possible to add a string to an 
integer. 2 

A ValueError, on the other hand, occurs when the object involved has the correct 
type but an invalid value : 

>>> float('hello') 


Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

ValueError: could not convert string to float: 'hello' 


2 Unlike in, say, JavaScript or PHP, where it seems anything goes. 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:20, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.101 7/CB09781 1 39871 754.004 



4.1 Errors and exceptions 


105 


Table 4.1 Common Python exceptions 


Exception 

Cause and description 

FileNotFoundError 

Attempting to open a file or directory that does not exist - this 
exception is a particular type of OSError. 

IndexError 

Indexing a sequence (such as a list or string) with a subscript 
that is out of range. 

KeyError 

Indexing a dictionary with a key that does not exist in that 
dictionary (see Section 4.2.2). 

NameError 

Referencing a local or global variable name that has not been 
defined. 

TypeError 

Attempting to use an object of an inappropriate type as an 
argument to a built-in operation or function. 

ValueError 

Attempting to use an object of the correct type but with an 
incompatible value as an argument to a built-in operation or 
function. 

ZeroDivisionError 

Attempting to divide by zero (either explicitly (using 7’ or 7/’) 
or as part of a modulo operation ‘%’). 

SystemExit 

Raised by the sys . exit function (see Section 4.4.1) - if not 
handled, this function causes the Python interpreter to exit. 


The float built-in does take a string as its argument, so float ('hello' ) is not 
a TypeError: the exception is raised because the particular string ‘hello’ does not 
evaluate to a meaningful floating point number. More subtly, 

>>> int('7.0') 


Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

ValueError: invalid literal for int() with base 10: '1.0' 

A string that looks like a float cannot be directly cast to int: to obtain the resuit 
probably intended, use int (float ('7.0')). 

Table 4.1 provides a list of the more commonly encountered built-in exceptions and 
their descriptions. 


Example E4.2 When an exception is raised but not handled (see Section 4.1.3), Python 
will issue a traceback report indicating where in the program flow it occurred. This 
is particularly useful when an error occurs within nested functions or within imported 
modules. For example, consider the following short program: 3 

# exception-test .py 
import math 


def fune(x): 

def trig(x): 


^ Note the use of f._name_ to retum a string representation of a function’s name in this program; for 

example, math .sin._name_ is 'sin'. 
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for f in (math.sin, math.cos, math.tan): 

print('{f}({x}) = {res }' . format(f=f._name_, x=x, res=f(x))) 

def invtrig(x): 

for f in (math.asin, math.acos, math.atan): 

O print('{f}({x}) = {res}' .format(f=f. name , x=x, res = f(x))) 

trig(x) 

© invtrig(x) 

© fune(1.2) 

The function fune passes its argument, x, to its two nested functions. The first, trig, 
is unproblematic but the second, invtrig, is expected to fail for x out of the domain 
(range of acceptable values) for the inverse trigonometric function, as in: 

sin(1.2) = 0.9320390859672263 

COS(1.2) = 0.3623577544766736 
tan(1.2) = 2.5721516221263183 
Traceback (most recent call last): 

File "exception-test.py", line 14, in <module> 
fune(1.2) 

File "exception-test.py", line 12, in fune 
invtrig(x) 

File "exception-test.py", line 10, in invtrig 

print( '{ f}({x}) = {res}'.format(f=f._name_, x=x, res=f(x))) 

ValueError: math domain error 

Following the traceback backward shows that the ValueError exception was raised 
within invtrig (line 10, O), which was called from within fune (line 12, ©), which 
was itself called by the exception-test .py module (i.e., program) at line 14, ©. 


4.1.3 Handling and raising exceptions 

Handling exceptions 

Often, a program rnust manipulate data in a way which might cause an exception to 
be raised. Assuming such a condition is not to cause the program to exit with an error 
but to be handled “gracefully” in some sense (an invalid data point ignored, division by 
a zero value skipped, and so on), there are two approaches to this situation: check the 
value of the data object before using it, or “handle” any exception that is raised before 
resuming exeeution. The Pythonic approach is the latter, summed up in the expression 
It is Easier to Ask Forgiveness than to seek Permission (EAFP). 

To catch an exception in a block of code, write the code within a try: clause and 
handle any exceptions raised in an except: clause. For example, 

try: 

y = 1 / x 

print('1 /', x, ' = ',y) 
except ZeroDivisionError: 

print('1 / 0 is not defined.') 

# ... more statements 

No check is required: we go ahead and calculate l/x and handle the error arising from 
division by zero if necessary. The program exeeution continues after the except block 
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whether the ZeroDivisionError exception was raised or not. If a different exception 
is raised (e.g., a NameError because x is not defined), then this will not be caught - it 
is an unhandled exception and will trigger an error message. 

To handle more than one exception in a single except block, list them in a tuple 
(which must be within brackets). 


try: 

Y = 1. / x 

print ('1 /', x, ' = ',y) 
except (ZeroDivisionError, NameError): 
print ('x is zero or undefined!) 

# ... more statements 

To handle each exception separately, use more than one except clause: 

try: 

y = i. / x 

print ('1 /', x, ' = ',y) 
except ZeroDivisionError: 

print ('1 / 0 is not defined. 7 ) 
except NameError: 

print ('x is not defined') 

# ... more statements 


Wcirning: You may come across the following type of construction: 


try: 

[do something] 

except: # Don't do this! 

pass 


This will execute the statements in the try block and ignore any exceptions raised - it 
is very unwise to do this as it makes code very hard to maintain and debug (errors, 
whatever their cause, are silently supressed). Always catch specific exceptions and 
handle them appropriately, allowing any other exceptions to “bubble up” to be handled 
(or not) by any other except clauses. 

The try . . . except statement has two more optional clauses (which must follow 
any except clauses if they are used). Statements in a block following the finally 
keyword are always executed, whether an exception was raised or not. Statements in 
a block following the else keyword are executed if an exception was not raised (see 
Example E4.5). 


0 Raising exceptions 

Usually an exception is raised by the Python interpreter as a resuit of some behavior 
(anticipated or not) by the program. But sometimes it is desirable for a program to raise 
a particular exception if some condition is met. The raise keyword allows a program 
to force a specific exception and customize the message or other data associated with it. 
For example, 
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if n % 2: 

raise ValueError('n must be even!') 

# st at ements here may proceed, knowing n is even ... 

A related keyword, assert, evaluates a conditional expression and raises an 
AssertionError exception if that expression is not True. This is useful to check 
that some essential condition holds at a specific point in your program’s execution and 
is often helpful in debugging. 

>>> assert 2==2 # [silence]: 2==2 is True so nothing happens 

>>> 

>>> assert 1==2 # Will raise the AssertionError 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

AssertionError 

The syntax assert exprl, expr2 passes expr2 (typically an error message) to the 
AssertionError: 

>>> assert 1==2, 'One does not equal two' 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

AssertionError: One does not equal two 

Python is a dynamically typed language and arguments of any type can be legally 
passed to a function, even if that function is expecting a particular type. It is sometimes 
necessary to check that an argument object is of a suitable type before using it, and 
assert could be used to do this. 


Example E4.3 The following function retums a string representation of a two- 
dimensional (2D) or three-dimensional (3D) vector, which must be represented as 
alistortuple containing two or three items. 

>>> def str_vector(v): 

... assert type(v) is list or type(v) is tuple,\ 

... 'argument to str_vector must be a list or tuple' 

... assert len(v) in (2,3)A 

... 'vector must be 2D or 3D in str_vector' 

... unit_vectors = ['i','j','k'] 

s = [] 

... for i, component in enumerate(v): 

... s.append('{}{}'.format(component, unit_vectors[i])) 

O ... return '+'.join(s).replace('+-', '-') 

O replace ) here converts, for example, ' 4i+- 3 j ' into ' 4 i - 3 j '. 


Example E4.4 As another example, suppose you have a function that calculates the 
vector (cross) product of two vectors represented as list objects. This product is only 
defined for three-dimensional vectors, so calling it with lists of any other length is an 
error. 
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>>> def cross_product(a, b): 

... assert len(a) == len(b) == 3, 'Vectors a, b must be three-dimensional' 

... return [a [ 1 ] *b [ 2 ] - a[2]*b[l], 

... a[2]*b[0] - a[0]*b[2], 

a [0] *b [1] - a [1] *b [0] ] 

>>> cross_product([1, 2,-1], [2, 0,-1, 3]) # Oops 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

File "<stdin>", line 2, in cross_product 
AssertionError: Vectors a, b must be three-dimensional 
>>> cross_product([1, 2, -1], [2, 0, -1]) 

[-2, -1, -4] 


Example E4.5 The following code gives an example of the use of a try . . . except 
. . . else . . . finally clause: 

# try-except-else-finally.py 

def process_file(filename): 

try: 

fi = open(filename, 'r') 
except IOError: 

print('Oops: couldn\'t open {} for reading'.format(filename)) 

return 
else: 

O lines = fi.readlines() 

print('{} has {} lines.'.format(filename, len(lines))) 
fi.close() 

finally: 

0 print(' Done with file {}'.format(filename)) 

print('The first line of {} is:\n{}'.format(filename, lines [0])) 

# further processing of the lines ... 

return 

process_file('sonnetO.txt') 
process_file('sonnetl8.txt') 

O Within the else block, the contents of the file are only read if the file was success- 
fully opened. 

© Within the finally block,‘Done with file f ilename is printed whether the 
file was successfully opened or not. 

Assuming that the file sonnetO . txt does not exist but that sonnetlS . txt does, 
running this program prints: 

Oops: couldn't open sonnet0.txt for reading 
Done with file sonnet0.txt 
sonnetl8.txt has 14 lines. 

Done with file sonnetl8.txt 
The first line of sonnetl8.txt is: 

Shall I compare thee to a summer's day? 
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4.1.4 Exercises 
Questions 

Q4.1.1 What is the point of else? Why not put statements in this block inside the 
original try block? 


Q4.1.2 What is the point of the f inally clause? Why not put any statements you want 
executed after the try block (regardless of whether or not an exception has been raised) 
after the entire try . . . except clause? 

Hint: see what happens if you modify Example E4.5 to put the statements in the 
f inally clause after the try block. 


Problems 


P4.1.1 Write a program to read in the data from the file swal low- speeds . txt (avail- 
able at scipython.com/ex/ada ) and use it to calculate the average air-speed velocity of 


an (unladen) African swallow. Use exceptions to handle the processing of lines that do 
not contain valid data points. 


P4.1.2 Adapt the function of Example E4.3, which returns a vector in the following 
form: 

>>> print (str_vector([-2, 3.5]) 

-2i + 3.5j 

>>> print (str_vector((4, 0.5, -2)) 

4i + 0.5j - 2k 

to raise an exception if any element in the vector array does not represent a real number. 


P4.1.3 Python follows the convention of many computer languages in choosing to 
de fine 0° = 1. Write a function, powr (a, b) , which behaves the same as the Python 
expression a**b (or, for that matter, math.pow(a,b)) but raises a ValueError if a 
and b are both zero. 


4.2 Python objects III: dictionaries and sets 

A dictionary in Python is a type of “associative array” (also known as a “hash” in some 
languages). A dictionary can contain any objects as its values, but unlike sequences 
such as lists and tuples, in which the items are indexed by an integer starting at 0, each 
item in a dictionary is indexed by a unique key, which may be any immutable object. 4 
The dictionary therefore exists as a number of key-value pairs, which do not have any 
particular order. Dictionaries themselves are mutable objects. 


4 Actually, dictionary keys can be any hashable object: a hashable object in Python is one with a special 
method for generating a particular integer from any instance of that object; the idea is that instances (which 
may be large and complex) that compare as equal should have hash numbers that also compare as equal so 
they can be rapidly looked up in a hash table. This is important for some data structures and for optimizing 
the speed of algorithms involving their objects. 
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4.2.1 Defining and indexing a dictionary 

An dictionary can be defined by giving key: value pairs between braces: 

>>> height = {'Burj Khalifa': 828., 'One World Trade Center': 541.3, 

'Mercury City Tower': -1., 'Ql': 323., 

'Carlton Centre': 223., 'Gran Torre Santiago': 300., 

'Mercury City Tower': 339.} 

>>> height 
{'Ql' : 323.0, 

'Burj Khalifa': 828.0, 

'Carlton Centre': 223.0, 

'One World Trade Center': 541.3, 

'Mercury City Tower': 339.0, 

'Gran Torre Santiago': 300.0} 

The command print (height) will retum the dictionary in the same format (between 
braces), but in no particular order. If the same key is attached to different values (as 
'Mercury City Tower' is here), only the most recent value survives: the keys in a 
dictionary are unique. 

An individual item can be retreived by indexing it with its key, either as a literal 
(' Ql') or with a variable equal to the key: 

>>> height['One World Trade Center'] 

541.3 

>>> building = 'Carlton Centre' 

>>> height[building] 

223.0 

Items in a dictionary can also be assigned by indexing it in this way: 

height['Empire State Building'] = 381. 
height['The Shard'] = 306. 

An altemative way of defining a dictionary is to pass a sequence of (key, value) 
pairs to the dict constructor. If the keys are simple strings (of the sort that could be 
used as variable names), the pairs can also be specified as keyword arguments to this 
constructor: 

>>> ordinal = dict([(l, 'First'), (2, 'Second'), (3, 'Third')]) 

>>> mass = dict(Mercury=3.301e23, Venus=4.867e24, Earth=5.972e24) 

>>> ordinal[2] # NB 2 here is a key, not an index 

'Second' 

>>> mass['Earth'] 

5.972e+24 

A for-loop iteration over a dictionary returns the dictionary keys (in no particular 
order): 

>>> for c in ordinal: 

... print(c, ordinal[c]) 


3 Third 

1 First 

2 Second 
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Example E4.6 A simple dictionary of roman numerals: 


>>> numerals = {'one 
' six 

' :'I' , 

':'VI' 

'two':'II', 'three':'III', 'four': 
, 'seven':'VII', 'eight':'VIII', 

' IV' , 

'five' 

1: ' 

I' , 2 : 

'II', 3: 'III', 4:'IV', 5: 'V', 6: 

'VI' , 

7:'VII 


8:'VIII'} 

>>> for i in ['three', 'four', 'five', 'six']: 
... print(numerals[i], end=' ') 


III IV V VI 

>>> for i in range(8,0,-1): 

... print(numerals[i], end=' ') 

VIII VII VI V IV III II I 

Note that even though the keys are stored in an arbitrary order, the dictionary can be 
indexed in any order. Note also that although the dictionary keys must be unique, the 
dictionary values need not be. 


4.2.2 Dictionary methods 

get () 

Indexing a dictionary with a key that does not exist is an error: 

>>> mass ['Pluto'] 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

KeyError: 'Pluto' 

However, the useful method get ( ) can be used to retrieve the value, given a key if 
it exists, or some default value if it does not. If no default is specified, then None is 
returned. For example, 

>>> print(mass.get('Pluto')) 

None 

>>> mass.get('Pluto', -1) 

-1 


keys, values and items 

The three methods, keys, values and items, retum respectively, a dictionary’s keys, 
values and key-value pairs (as tuples). In previous versions of Python, each of these 
were retumed in a list, but for most purposes this is wasteful of memory: calling keys, 
for example, required all of the dictionary’s keys to to be copied as a list, which in most 
cases was simply iterated over. That is, storing a whole new copy of the dictionary’s 
keys is not usually necessary. Python 3 solves this by returning an iterable object, which 
accesses the dictionary’s keys one by one, without copying them to a list. This is faster 
and saves memory (important for very large dictionaries). For example, 

>>> planets = mass.keys() 

>>> print(planets) 

dict_keys(['Venus', 'Mercury', 'Earth']) 

>>> for planet in planets: 

... print(planet, mass[planet]) 
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Venus 4.867e+24 
Mercury 3.301e+23 
Earth 5.972e+24 

A dict_keys object can be iterated over any number of times, but it is not a list and 
cannot be indexed or assigned: 

>>> planets = mass.keysO 
>>> planets[0] 


Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

TypeError: 'dict_keys' object is not subscriptable 

If you really do want a list of the dictionary’s keys, simply pass the dict_keys object 
to the list constructor (which takes any kind of sequence and makes a list out of it): 

>>> planet_list = list(mass.keys()) 

>>> planet_list[0] 

'Venus' 

O >>> planet_list[1] = 'Jupiter' 

>>> planet_list 

['Venus', 'Jupiter', 'Earth'] 

O This last assignment only changes the planet list list; it doesn’t alter the original 
dictionary’s keys. 

Similar methods exist for retrieving a dictionary’s values and items (key-value pairs): 
the objects returned are dict_values and dict items. 

For example, 

>>> mass.items() 

dict_items ([('Venus' , 4.867e+24), ('Mercury', 3.301e+23), ('Earth', 5.972e+24)]) 

>>> mass.values() 

dict_values( [4.867e+24, 3.301e+23, 5.972e+24]) 

>>> for planet_data in mass.items(): 

... print(planet_data) 


('Venus', 4.867e+24) 
('Mercury', 3.301e+23) 
('Earth', 5.972e+24) 


Example E4.7 A Python dictionary can act as a kind of simple database. The following 
code Stores some information about some astronomical objects in a dictionary of tuples, 
keyed by the object name, and manipulates them to produce a list of planet densities. 

Listing 4.1 Astronomical data 

# eg4-astrodict .py 
import math 


# Mass (in kg) and radius (in km) for some astronomical bodies 
body = {'Sun': (1.988e30, 6.955e5), 

'Mercury': (3.301e23, 2440.), 

'Venus': (4.867e+24, 6052.), 
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'Earth': (5.972e24, 6371.), 

'Mars': (6.417e23, 3390.), 
'Jupiter': (1.899e27, 69911.) 
'Saturn': (5.685e26, 58232.), 

'Uranus': (8.682e25, 25362.), 

'Neptune': (1.024e26, 24622.) 

} 


planets = list(body.keys()) 

# The sun isn't a planetl 
planets.remove('Sun') 

def calc_density(m, r): 

""" Returns the density of a sphere with mass m and radius r. 
return m / (4/3 * math.pi * r**3) 


rho = {} 

for planet in planets: 
m, r = body[planet] 

# calculate the density in g/cm3 

rho[planet] = calc_density(m*1000, r*l.e5) 

O for planet, density in sorted(rho.items()): 

print('The density of {0} is {l:3.2f} g/cm3'.format(planet, density)) 

O sorted (rho. items () ) returns a list of the rho dictionary’s key-value pairs, 
sorted by key. 

The output is 

The density of Earth is 5.51 g/cm3 
The density of Jupiter is 1.33 g/cm3 
The density of Mars is 3.93 g/cm3 
The density of Mercury is 5.42 g/cm3 
The density of Neptune is 1.64 g/cm3 
The density of Saturn is 0.69 g/cm3 
The density of Uranus is 1.27 g/cm3 
The density of Venus is 5.24 g/cm3 


0 Keyword arguments 

In Section 2.7, we discussed the syntax for passing arguments to functions. In that 
description, it was assumed that the function would always know what arguments could 
be passed to it and these were listed in the function definition. For example, 

def fune(a, b, c): 

Python provides a couple of useful features for handling the case where it is not 
necessarily known what arguments a function will receive. Including *args (after any 
“formally defined” arguments) places any additional positional argument into a tuple, 
args, as illustrated by the following code: 

>>> def fune(a, b, *args): 

. . . print (args) 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:20, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://d 0 i. 0 rg/l 0.1017/CB09781139871754.004 




4.2 Python objects III: dictionaries and sets 


115 


>>> func(l, 2, 3, 4, 'msg 7 ) 

(3, 4, 7 msg 7 ) 

That is, inside fune, in addition to the formal arguments a=l and b=2, the arguments 
3, 4 and 'msg' are available as the items of the tuple args. This tuple can be 
arbitrarily long. Python’s own print built-in function works in this way: it takes an 
arbitrary number of arguments to output as a string, followed by sorne optional keyword 
arguments: 

def print(*args, sep=' ', end='\n' ( file=None): 

It is also possible to collect arbitrary keyword arguments (see Section 2.7.2) to a 
function inside a dictionary by using the **kwargs syntax in the function definition. 
Python takes any keyword arguments not specified in the function definition and packs 
them into the dictionary kwargs. For example, 

>>> def fune(a, b, **kwargs): 

. . . for k in kwargs: 

. .. print (k, 7 = ' , kwargs [k]) 


>>> fune(1, b=2, c=3, d=4, s= , msg / ) 
d = 4 
s = msg 
c = 3 

One can also use *args and **kwargs when calling a function, which can be 
convenient, for example, with functions that take a large number of arguments: 

>>> def fune(a, b, c, x, y, z) : 

... print (a, b, c) 

... print (x, y , z) 

>>> args = [1,2,3] 

>>> kwargs = {'x 7 : 4, 7 y 7 : 5, 7 z 7 : 'msg 7 } 

>>> func(*args, **kwargs) 

12 3 
4 5 msg 


4.2.3 Sets 

A set is an unordered collection of unique items. As with dictionary keys, elements of 
a set must be hashable objects. A set is useful for removing duplicates from a sequence 
and for determining the union, intersection and difference between two collections. 
Because they are unordered, set objects cannot be indexed or sliced, but they can be 
iterated over, tested for membership, and they support the len built-in. A set is created 
by listing its elements between braces ({...}) or by passing an iterable to the set () 
constructor: 

>>> s = set([l, 1, 4, 3, 2, 2, 3, 4, 1, 3, 7 surprise! 7 ]) 

>>> s 

{l, 2, 'surprise! 7 , 3, 4} 

>>> len(s) # cardinality of the set 

5 
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>>> 2 in s, 6 not in s # membership, nonmembership 

(True, True) 

>>> for item in s: 

. . . print (item) 


1 

2 

surprise! 

3 

4 

The set method add is used to add elements to the set. To remove elements there are 
several methods: remove removes a specified element but raises a KeyError exception 
if the element is not present in the set; discard () does the same but does not raise an 
error in this case. Both methods take (as a single argument) the element to be removed. 
pop (with no argument) removes an arbitrary element from the set and ciear removes 
all elements from the set: 

>>> S = {2,-2,0} 

>>> s.add(l) 

>>> s.add(-l) 

O >>> s.add(1.0) 

>>> s 

{0, 1, 2, -1, -2} 

>>> s.remove(1) 

>>> s 

{0, 2, -1, -2} 

>>> s.discard(3) # OK - does nothing 

>>> s 

{0, 2, -1, -2} 

>>> s.popO 

0 # (for example) 

>>> s 

{ 2 , - 1 , - 2 } 

>>> s.ciear() 

set() # the empty set 

O This statement will not add a new member to the set, even though the existing l is 
an integer and the item we’re adding is a f loat. The test l == l. o is True, so l . o is 
considered to be already in the set. 

set objects have a wide range of methods corresponding to the properties of mathe- 
matical sets; the most useful are illustrated in Table 4.2, which uses the following terms 
from set theory: 

• The cardinality of a set, |A|, is the number of elements it contains. 

• Two sets are equal if they both contain the same elements. 

• Set A is a subset of set B (A c B) if ali the elements of A are also elements of B; 
set B is said to be a superset of set A. 

• Set A is a proper subset of B (A c B ) if it is a subset of B but not equal to B; in 
this case, set B is said to be a proper superset of A. 

• The union of two sets (A U B ) is the set of ali elements from both of them. 

• The intersection of two sets (A H B) is the set of all elements they have in 

common. 
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Table 4.2 set methods 


Method Description 


Is set disjoint with otherl 
Is set a subset of otherl 


isdisjoint ( other) 

issubset (other ), 

set <= other 

set < other 

issuperset (other ), 

set >= other 

set > other 

union ( other) 

set | other j ... 

intersection (other ), 

set & other & ... 

difference (other ), 

set - other - ... 

symmetric_difference ( other ), 

set " other " ... 


Is set a proper subset of otherl 
Is set a superset of otherl 

Is set a proper superset of otherl 
The union of set and other(s) 

The intersection of set and other(s) 

The difference of set and other(s) 

The symmetric difference of set and other(s) 


• The difference of set A and set B (A\B) is the set of elements in A that are not in B. 

• The symmetric difference of two sets, A AB, is the set of elements in either but 
not in both. 

• Two sets are said to be disjoint if they have no elements in common. 

There are two forms for most set expressions: the operator-like syntax requires all 
arguments to be set objects, whereas explicit method calls will convert any iterable 
argument into a set. 

>>> A = set( (1, 2, 3)) 

>>> B = set((1, 2, 3, 4)) 

>>> A <= B 
True 

>>> A.issubset((1, 2, 3, 4)) # OK: (1, 2, 3, 4) is turned into a set 

True 


Some more examples: 

>>> C, D = set((3, 4, 

>>> B | C 

{1, 2, 3, 4, 5, 6} 

>>> A | C | D 

{l, 2, 3, 4, 5, 6, 7, 

>>> A & C 

{3} 

>>> C & D 
set () 

>>> C.isdisjoint(D) 
True 

>>> B - C 

{ 1 . 2 } 

>>> B * C 
{1, 2, 5, 6} 


5, 6)), set((7, 8, 9)) 

# union 

# union of three sets 
8, 9} 

# intersection 


# the empty set 

# difference 

# symmetric difference 
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0 frozensets 

sets are mutable objects (items can be added to and removed from a set); because 
of this they are unhashable and so cannot be used as dictionary keys or as members of 
other sets. 

>» a = set ((1,2,3)) 

>>> b = set(('q', (1,2), a)) 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

TypeError: unhashable type: 'set' 

>>> 

(In the same way, lists cannot be dictionary keys or set members.) There is, however, 
a frozenset object which is a kind of immutable (and hashable) set. 5 frozensets 
are fixed, unordered collections of unique objects and can be used as dictionary keys 
and set members. 

>>> a = frozenset((1,2,3)) 

>>> b = set(('q', (1,2), a)) # OK: the frozenset a is hashable 

>>> b.add(4) # OK: b is a regular set 

»> a.add(4) # Not OK: frozensets are immutable 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

AttributeError: 'frozenset' object has no attribute 'add' 


Example E4.8 A Mersenne prime, Mi, is a prime number of the form M,- = 2' — 1. The 
set of Mersenne primes less than n may be thought of as the intersection of the set of all 
primes less than n, P n , with the set, A n , of integers satisfying 2' — 1 < n. 

The following program returns a list of the Mersenne primes less than 1000000. 

Listing 4.2 The Mersenne primes 

import math 


def primes(n): 

""" Return a list of the prime numbers <= n. """ 
sieve = [True] * (n // 2) 

for i in range(3, int (math.sqrt(n) )+1 , 2): 
if sieve[i//2]: 

sieve[i*i//2::i] = [False] * ((n - i*i - 1) // (2*i) +1) 
return [2] + [2*i+l for i in range(l, n // 2) if sieve[i]] 


n = 1000000 


O p = 


set(primes(n)) 


5 


In a sense, they are to sets what tuples are to lists. 
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# A list of integers 2*i-l <= n 
A = [] 

0 for i in range(2, int(math.log(n+1, 2))+l): 

A.append(2**i - 1) 

# The set of Mersenne primes as the intersection of P and A 
M = P.intersection(A) 

# Output as a sorted list of M 
0 print (sorted(list(M))) 


The prime numbers are produced in a list by the function primes, which implements 
an optimized version of the Sieve of Eratosthenes algorithm (see Exercise P2.5.8); this 
is converted into the set, P (O). We can take the intersection of this set with any iterable 
object using the intersection method, so there is no need to explicitly convert our 
second list of integers, A, ( 0 ) into a set. 

© Finally, the set of Mersenne primes we create, M, is an unordered collection, so for 
output purposes we convert it into a sorted list. 

For n = 1000000, This output is 

[3, 7, 31, 127, 8191, 131071, 524287] 


4.2.4 Exercises 
Questions 

Q4.2.1 Write a one-line Python program to determine if a string is a pangram (a string 
that contains each letter of the alphabet at least once). 

Q4.2.2 Write a function, using set objects, to remove duplicates from an ordered 
list. For example, 

>>> remove_dupes([1,1,2,3,4,4,4,5,7,8,8,91 ) 

[1, 2, 3, 4, 5, 7, 8, 9] 

Q4.2.3 Predict and explain the effect of the following statements: 

>>> set('hellohellohello') 

>>> set(['hellohellohello']) 

>>> set(('hellohellohello')) 

>>> set(('hellohellohello',)) 

>>> set(('hello', 'hello', 'hello')) 

>>> set(('hello', ('hello', 'hello'))) 

>>> set(('hello', ['hello', 'hello'])) 


Q4.2.4 If f rozenset objects are immutable, how is this possible? 

>>> a = frozenset((1,2,3)) 

>>> a |= {2,3,4,5} 

>>> print (a) 

frozenset([1, 2, 3, 4, 5]) 
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Table 4.3 Resistor color codes 


Color 

Abbreviation 

Significant figures 

Multiplier 

Tolerance 

Black 

bk 

0 

1 

— 

Brown 

br 

1 

10 

±1% 

Red 

rd 

2 

10 2 

±2% 

Orange 

or 

3 

10 3 

- 

Yellow 

yi 

4 

10 4 

±5% 

Green 

gr 

5 

10 5 

±0.5% 

Blue 

bl 

6 

10 6 

±0.25% 

Violet 

vi 

7 

10 7 

±0.1% 

Gray 

gy 

8 

10 8 

±0.05% 

White 

wh 

9 

10 9 

- 

Gold 

au 

- 

- 

±5% 

Sil ver 

ag 

- 

- 

±10% 

None 

-- 

- 

- 

±20% 


Problems 

P4.2.1 The values and tolerances of older resistors are identified by four colored bands: 
the first two indicate the first two significant figures of the resistance in ohms, the third 
denotes a decimal multiplier (number of zeros), and the fourth indicates the tolerance. 
The colors and their meanings for each band are listed in Tabie 4.3. 

For exampie, a resistor with colored bands violet, yellow, red, green has value 74 x 
10 2 = 7400 <7 and tolerance ±0.5%. 

Write a program that defines a function to translate a list of four color abbreviations 
into a resistance value and a tolerance. For exampie, 

In [x] : print (get_resistor_value([ ' vi', ' yl', 'rd', 'gr'])) 

Out[x]: (7400, 0.5) 


P4.2.2 The novel Moby-Dick is out of Copyright and can be downloaded as a text 
file from the Project Gutenberg website at www.gutenberg.org/2/7/0/2701/. Write a 
program to output the 100 words most frequently used in the book by storing a count of 
each word encountered in a dictionary. 

Hint: use Python’s string methods to strip out any punctuation. It suffices to replace 
any instances of the following charaeters with the empty string: 

When you have a dictionary with words as the keys and the corresponding word counts 
as the values, create a list of (count, word ) tuples and sort it. 

Bonus exercise: compare the frequencies of the top 2000 words in Moby-Dick with 
the prediction of Zipf’s Law: 

log f(w) = log C - a log r(w), 

where f(w) is the number of occurrences of word w, r(w ) is the corresponding rank 
(1 = most common, 2 = second most common, etc.) and C and a are constants. In the 
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traditional formulation of the law, C = log/(wq) and a = 1, where w \ is the most 
common word, such that r(w\) = 1. 


P4.2.3 Reverse notation (RPN) (or postfix notation) is a notation for mathematical 
expressions in which each operator follows ali of its operands (in contrast to the more 
familiar infix notation, in which the operator appears between the operands it acts on). 
For example, the infix expression 5 + 6 is written in RPN as 5 6 +. The advantage of 
this approach is that parentheses are not necessary: to evaluate (3 + 7) / 2,itmaybe 
written as 3 7 + 2 /. An RPN expression is evaluated leftto right with the intermedi- 
ate values pushed onto a stack - a last-in, first-out list of values - and retrieved (popped) 
from the stack when needed by an operator. Thus, the expression 3 7 + 2/ proceeds 
with 3 and then 7 pushed to the stack (with 7 on top). The next token is +, so the values 
are retrieved, added, and the resuit, l o pushed onto the (now empty) stack. Next, 2 is 
pushed to the stack. The final token / pops the two values, 10 and 2 from the stack, and 
divides them to give the resuit, 5. 

Write a program to evaluate an RPN expression consisting of space-delimited tokens 
(the operators + -*/** andnumbers). 

Hint: parse the expression into a list of string tokens and iterate over it, converting and 
pushing the numbers to the stack (which may be implemented by appending to a list). 
Define functions to carry out the operations by retrieving values from the stack with 
pop. Note that Python does not provide a switch. . . case syntax, but these function 
objects can be the values in a dictionary with the operator tokens as the keys. 


3 4.2.4 Use the dictionary of Morse code symbols available from 


scipython.com/ex/adb to write a program that can translate a message to and from 


Vforse code, using spaces to delimit individual Morse code “letters” and slashes (’/’) 
to delimit words. For example, ' python 3 ' becomes - .... -- -. / 


P4.2.5 The file shark-species.txt, available at scipython.com/ex/adc , contains 
a list of extant shark species arranged in a hierachy by order, family, genus and species 
(with the species given as binomial name : common name). Read the file into a data 
structure of nested dictionaries, which can be accessed as follows: 


>>> sharks['Lamniformes 7 ]['Lamnidae 7 ]['Carcharodon 7 ]['C. carcharias 7 ] 
Great white shark 


4.3 Pythonic idioms: “syntactic sugar” 

Many computer languages provide syntax to make common tasks easier and clearer to 
code. Such syntactic sugar consists of constructs that could be removed from the lan- 
guage without affecting the language’s functionality. We have already seen one exam¬ 
ple in so-called augmented assignment: a += l is equivalent to a = a + 1. Another 
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example is negative indexing of sequences: b [ - 1] is equivalent to and more convenient 
than b [len (b) -1]. 


4.3.1 Comparison and assignment shortcuts 

If more than one variable is to be assigned to the same object, the shortcut 

x = y = z = -1 

may be used. Note that if mutable objects are assigned this way, the variable names will 
ali refer to the same object, not to distinet copies of it (recall Section 2.4.1). 

Similarly, as was shown in Section 2.4.2, multiple assignments to different objects 
can be acheived in a single line by tuple unpacking : 

a, b, c = x + 1, 'hello', -4.5 

The tuple on the right-hand side of this expression (parentheses are optional in this case) 
is unpacked in the assignment to the variable names on the left-hand side. This single 
line is thus equivalent to the three lines 

a = x + 1 
b = 'hello' 
c = -4.5 


In expressions such as these the right-hand side is evaluated first and then assigned 
to the left-hand side. As we have seen in Section 2.4.2, this provides a very use- 
ful way of swapping the value of two variables without the need for a temporary 
variable: 

a, b = b, a 

Comparisons may also be chained together in a natural way: 

if a == b == 3: 

print ('a and b both equal 3' ) 
if -1 < x < 1: 

print ('x is between -1 and 1') 

Python supports conditional assignment : a variable name can be set to one value or 
another depending on the outcome of an if ... e Ise expression on the same line as 
the assignment. For example, 

y = math.sin(x)/x if x else 1 

Short examples such as this one, in which the potential division by zero is avoided 
(recall that o evaluates to False) are benign enough, but the idiom should be avoided 
for anything more complex in favor of a more explicit construet such as 

try: 

y = math.sin(x)/x 
except ZeroDivisionError: 
y = 1 
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4.3.2 List comprehension 

A list comprehension in Python is a construet for creating a list based on another iterable 
object in a single line of code. For example, given a list of numbers, xlist, a list of the 
squares of those numbers may be generated as follows: 

>>> xlist = [1, 2, 3, 4, 5, 6] 

>>> x2list = [x**2 for x in xlist] 

>>> x2list 

[1, 4, 9 , 16, 25 , 36] 

This is a faster and syntactically nicer way of creating the same list with a block of code 
within a for loop: 

>>> x2list = [] 

>>> for x in xlist: 

... x21ist.append(x**2) 


List comprehensions can also contain conditional statements: 

>>> x2list = [x**2 for x in xlist if x % 2] 

>>> x2list 
[1, 9 , 25] 

Here, x gets fed to the x* * 2 expression to be entered into the x21 i s t under construction 
only if x % 2 evaluates to True (i.e., if x is odd). This is an example of a filter (a single 
if conditional expression). If you require a more complex mcipping of values in the 
original sequence to values in the constructed list, the if . . e Ise expression nrust 
appear before the for loop: 

>>> [x**2 if x % 2 else x**3 for x in xlist] 

[1, 8, 9 , 64, 25, 216] 

This comprehension squares the odd integers and cubes the even integers in xlist. 

Of course, the sequence used to construet the list does not have to be another list. 
For example, strings, tuples and range objects are all iterable and can be used in list 
comprehensions: 

>>> [x**3 for x in range(1,10)] 

[1, 8, 27, 64, 125, 216, 343, 512, 729] 

>>> [w.upperO for w in ' abe xyz'] 

[' A' , 'B', 'C', ' ', 'X', 'Y ', ' Z'] 

Finally, list comprehensions can be nested. For example, the following code flattens 
a list of lists: 

>>> vlist = [[1,2,3], [4,5,6], [7,8,9]] 

>>> [c for v in vlist for c in v] 

[1, 2, 3, 4, 5, 6, 7, 8, 9] 

Here, the first loop produces the inner lists, one by one, as v, and each inner list v is 
iterated over as c to be added to the list being created. 
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Example E4.9 Consider a 3 x 3 matrix represented by a list of lists: 

M = [ [1,2,3] , 

[4,5,6] , 

[7,8,9]] 

Without using list comprehension, the transpose of this matrix could be built up by 
looping over the rows and columns: 

MT = [[0,0,0], [0,0,0], [0,0,0]] 

for ir in range(3): 

for ic in range(3): 

MT [ic] [ir] = M[ir] [ic] 

With one list comprehension, the transpose can be constructed as 

MT = [] 

for i in range(3): 

MT .append([row[i] for row in M]) 

where rows of the transposed matrix are built from the columns (indexed with i=o, l, 2) 
of each row in turn from m). The outer loop here can be expressed as a list comprehen¬ 
sion of its own: 

MT = [[row [i] for row in M] for i in range(3)] 

Note, however, that NumPy provides a much easier way to manipulate matrices (see 
Section 6.6). 


4.3.3 lambda functions 

A lambda function in Python is a type of simple anonymous functiori. The executable 
body of a lambda function must be an expressiori and not a statement; that is, it may not 
contain, for example, loop blocks, conditionals or print statements. lambda functions 
provide limited support for a programming paradigm known as Junctional program¬ 
ming. The simplest application of a lambda function differs little from the way a 
regular function def would be used: 

>>> f = lambda x: x**2 - 3*x + 2 
>>> print(f(4.)) 

6.0 

The argument is passed to x and the resuit of the function specilied in the lambda 
definition after the colon is passed back to the caller. To pass more than one argument 
to a lambda function, pass a tuple (without parentheses); 

>>> f = lambda x,y: x**2 + 2*x*y + y**2 
>>> f(2 . , 3.) 

25.0 


6 Functional programming is a style of programming in which computation is achieved through the evaluation 
of mathematical functions with minimal reference to variables defining the state of the program. 
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In these examples, not too much is gained by using a lambda function, and the 
functions defined are not all that anonymous either (because they’ve been bound to 
the variable name f). A more useful application is in creating a list of functions, as in 
the following example. 


Example E4.10 Functions are objects (like everything else in Python) and so can be 
stored in lists. Without using lambda we would have to detine named functions (using 
def ) before constructing the list: 

def const (x): 

return 1 . 
def lin (x): 

return x 
def square (x): 

return x**2 
def cube (x): 

return x**3 

flist = [const, lin, square, cube] 

Then flist [3] (5) returns 125, since flist [3] is the function cube, and is called 
with the argument 5. 

The value of using lambda expressions as anonymous functions is that if these 

functions do not need to be named if they are just to be stored in a list and so can 

be defined as items “inline” with the list construction: 

>>> flist = [lambda x: 1, 

. . . lambda x: x, 

. . . lambda x: x**2, 

... lambda x: x**3] 

>>> flist[3] (5) # flist[3] is x**3 

125 

>>> flist [2] (4) # flist[2] is x**2 

16 


Example E4.11 The sorted built-in and sort list method can order lists based on the 
returned value of a function called on each element prior to making comparisons. This 
function is passed as the key argument. For example, sorting a list of strings is case 
sensitive by default: 

>>> sorted('Nobody expects the Spanish Inquisition'.split()) 

['Inquisition', 'Nobody', 'Spanish', 'expects', 'the'] 

We can make the sorting case insensitive, however, by passing each word to the 
str. lower method: 

>>> sorted('Nobody expects the Spanish Inquisition' .split (), key=str.lower) 
['expects', 'Inquisition', 'Nobody', 'Spanish', 'the'] 

(Of course, key=str .upper would work just as well.) Note that the list elements 
themselves are not altered: they are being ordered based on a lowercase version of 
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themselves. We do not use parentheses here, as in str.lowerO, because we are 
passing th e function itself to the key argument, not calling it directly. 

It is typical to use lambda expressions to provide simple anonymous functions for 
this purpose. For example, to sort a list of atoms as (element Symbol, atomic number) 
tuples in order of atomic number (the second item in each tuple): 

>>> halogens = [('At', 85), ('Br', 35), ('C1', 17), ('F', 9), ('I', 53)] 

>>> sorted(halogens, key=lambda e: e[l]) 

[('F 7 , 9), ( 7 C1 7 , 17), ('Br 7 , 35), ( 7 I 7 , 53), ( 7 At 7 , 85)] 

Here, the sorting algorithm calls the function specified by key on each tuple item to 
decide where it belongs in the sorted list. Our anonymous function simply retums the 
second element of each tuple, and so sorting is by atomic number. 


4.3.4 The with statement 

The with statement creates a block of code that is executed within a certain context. 
A context is defined by a context manager that provides a pair of methods describing 
how to enter and leave the context. User-defined contexts are generally used only in 
advanced code and can be quite complex, but a common basic example of a built-in 
context manager involves file input / output. Here, the context is entered by opening the 
file. Within the context block, the file is read from or written to, and finally the file is 
closed on exiting the context. The file object is a context manager that is returned by 
the open () method. It delines an exit method which simply closes the file (if it was 
opened successfully), so that this does not need to be done explicitly. To open a file 
within a context, use 

with open( 7 filename 7 ) as f: 

# process the file in some way, for example: 
lines = f.readlines() 

The reason for doing this is that you can be sure that the file will be closed after the 
with block, even if something goes wrong in this block: the context manager handles 
the code you would otherwise have to write to catch such runtime errors. 


4.3.5 Generators 

Generators are a powerful feature of the Python language; they allow one to declare a 
function that behaves like an iterable object. That is, a function that can be used in a 
for loop and that will yield its values, in tum, on demand. This is often more efficient 
than calculating and storing ali of the values that will be iterated over (particularly 
if there will be a very large number of them). A generator function looks just like a 
regular Python function, but instead of exiting with a return value, it contains a yield 
statement which returns a value each time it is required to by the iteration. 

A very simple example should make this clearer. Let’s deline a generator, count, to 
count to n: 
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>>> def count(n): 
i=0 

... while i < n : 

i += 1 
yield i 

>>> for j in count(5): 

print(j) 

1 

2 

3 

4 

5 

Note that we can’t simply call our generator like a regular function: 

>>> count(5) 

<generator object count at 0xl02d8e6e0> 

The generator count is expecting to be called as part of an loop (here, the for loop) 
and on each iteration it yields its resuit and Stores its state (the value of i reached) 
until the loop next calls upon it. 

In fact, we have been using generators already because the familiar range built-in 
function is, in Python 3, a type of generator object. 

There is a generator comprehension syntax similar to list comprehension (use round 
brackets instead of square brackets): 

>>> squares = (x**2 for x in range(5)) 

>>> for square in squares: 
print (square) 


0 

1 

4 

9 

16 

However, once we have “exhausted” our generator comprehension defined in this way, 
we cannot iterate over it again without redefining it. If we try: 

>>> for square in squares: 

. . . print (square) 


>>> 

we get nothing as we have already reached the end of the squares generator. 

To obtain a list or tuple of a generator’s values, simply pass it to list or tuple, as 
shown in the following example. 


Example E4.12 This function delines a generator for the triangular numbers, T n = 
J2k=\ k — 1 + 2 + 3 + • ■ • + n, for n = 0,1,2, ■ ■ •: that is, T n = 0,1,3,6,10, • ■ ■. 

>>> def triangular_numbers(n): 
i, t = 1, 0 
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... while i <= n: 

... yield t 

t += i 
i += 1 

>>> list(triangular_numbers(15)) 

[0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105] 

Note that the statements after the yield statement are executed each time 
triangular numbers resumes. The call to triangular_numbers (15) retums 
an iterator that feeds these numbers into list to generate a list of its values. 


4.3.6 0 map 

The built-in function map returns an iterator that applies a given function to every item 
of a provided sequence, yielding the results as a generator would. 7 For example, one 
way to sum a list of lists is to map the sum built-in to it: 

>>> mylists = [[1,2,3], [10, 20, 30], [25, 75, 100]] 

>>> list(map(sum, mylists)) 

[6, 60, 200] 

(We have to cast explicitly back to a list because map returns a generator-like 
object.) This statement is equivalent to the list comprehensioni 

>>> [sum(l) for 1 in mylists] 

[6, 60, 200] 

map is occasionally useful but has the potential to create very obscure code, and list 
or generator comprehensions are generally to be preferred. The same applies to the 
f ilter built-in, which constructs an iterator from the elements of a given sequence for 
which a provided function returns True. In the following example, the odd integers less 
than 10 are generated: this function returns x % 2, and this expression evaluates to 0, 
equivalent to False if x is even: 

>>> list(filter (lambda x: x%2, range(10))) 

[1, 3, 5, 7, 9] 

Again, the list comprehension is more expressive: 

>>> [x for x in range(10) if x % 2] 

[1, 3, 5, 7, 9] 


4.3.7 Exercises 
Questions 

Q4.3.1 Rewrite the list of lambda functions created in Example E4.10 using a single 
list comprehension. 


7 Constructs such as map are frequently used in functional programming. 
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Q4.3.2 What does the following code do and how does it work? 

> > > nmax = 5 
>>> x = [1] 

>>> for n in range(l,nmax+2): 

... print (x) 

... x = [([0]+x)[i] + (x+[0])[i] for i in range(n+1)] 


Q4.3.3 Consider the lists 


>>> a = ['A', 'B', 'C', 'D', 'E', 'F', 'G'] 

>>> b = [4, 2, 6, 1, 5, 0, 3] 

Predict and explain the output of the following statements: 


a. [a [x] for x in b] 

b. [a [x] for x in sorted(b)] 

c. [a [b [x] ] for x in b] 

d. [x for (y,x) in sorted(zip (b, a) ) ] 

Q4.3.4 Dictionaries are unsorted data structures. Write a one-line Python statement 
retuming a list of ( key , value) pairs sorted by key. Assume that all keys have the same 
data type (why is this important?). Repeat the exercise to produce a list ordered by 
dictionary values. 

Q4.3.5 In the television series The Wire, drug dealers encrypt telephone numbers with 
a simple substitution cypher based on the Standard layout of the phone keypad. Each 
digit of the number, with the exception of 5 and 0, is replaced with the corresponding 
digit on the other side of the 5 key (“jump the five”); 5 and 0 are exchanged. Thus, 555- 
867-5309 becomes 000-243-0751. Devise a one-line statement to encrypt and decrypt 
numbers encoded in this way. 


Problems 


P4.3.1 Use a list comprehension to calculate the trace of the matrix M (that is, the sum 
of its diagonal elements). Hint: the sum built-in function takes an iterable object and 
sums its values. 

P4.3.2 The ROT13 substitution cipher encodes a string by replacing each letter with 
the letter 13 letters after it in the alphabet (cycling around if necessary). For example, a 
—> n and p —> c. 

a. Given a word expressed as a string of lowercase characters only, use a list com¬ 
prehension to construet the ROT13-encoded version of that string. Hint: Python 
has a built-in function, ord, which converts a character to its Unicode code point 
(e.g., ord ('a' ) retums 97); another built-in, chr is the inverse of ord (e.g., 
chr ( 122 ) returns 'z'). 
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Figure 4.1 Rule 30 of Wolfram’s one-dimensional two-state cellular automata and the first seven 
generations. 


b. Extend your list comprehension to encode sentences of words (in lowercase) 
separated by spaces into a ROT13 sentence (in which the encoded words are 
also separated by spaces). 


P4.3.3 In A New Kind of Science , 8 Stephen Wolfram describes a set of simple one- 
dimensional cellular automata in which each cell can take one of two values: ‘on’ or 
‘off’. A row of cells is initialized in some state (e.g., with a single ‘on’ cell somewhere 
in the row) and it evolves into a new state according to a rule that determines the 
subsequent state of a cell (‘on’ or ‘off’) from its value and that of its two nearest 
neighbors. There are 2 3 = 8 different States for these three “parent” cells taken together 
and so 2 8 = 256 different automata rules; that is, the state of cell i in the next generation 
is determined by the States of cells i — 1, i and i + 1 in the present generation. 

These rules are numbered 0-255 according to the binary number indicated by the 
eight different outcomes each one specifies for the eight possible parent States. For 
example, rule 30 produces the outcome (off, off, off, on, on, on, on, off) (or 00011110) 
from the parent States given in the order shown in Figure 4.1. The evolution of the cells 
can be illustrated by printing the row corresponding to each generation under its parent 
as shown in this figure. 

Write a program to display the first few rows generated by rule 30 on the command 
line, starting from a single ‘on’ cell in the center of a row 80 cells wide. Use an asterisk 
to indicate an ‘on’ cell and a space to represent an ‘off’ cell. 


P4.3.4 The file iban_lengths.txt, available at scipython.com/ex/add contains 
two columns of data: a two-letter country code and the length of that country’s Interna¬ 
tional Bank Account Number (IBAN): 


AL 2 8 
AD 24 


GB 22 


S. Wofram (2002). A New Kind of Science, Wolfram Media. 
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The code snippet below parses the file into a dictionary of lengths, keyed by the 
country code: 

iban_lengths = {} 

with open('iban_lengths.txt') as fi: 
for line in fi.readlines(): 
fields = line.splitO 

iban_lengths[fields[0]] = int(fields[1]) 

Use a lambda function and list comprehension to achieve the same goal in (a) two lines, 
(b) one line. 

P4.3.5 The power set of a set S, P(S), is the set of ali subsets of S, including the empty 
set and S itself. For example, 

/»({1,2,3}) = {{}, {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}}. 

Write a Python program that uses a generator to return the power set of a given set. 

Hint: convert your set into an ordered sequence such as a tuple. For each item in this 
sequence return the power set formed from all subsequent items, inclusive and exclusive 
of the chosen item. Don’t forget to convert the tuples back to sets after you’re done. 

P4.3.6 The Brown Corpus is a collection of 500 samples of (American) English- 
language text that was compiled in the 1960s for use in the field of computational lin- 
guistics. It can be dowloaded from http://nltk.github.com/nltk_data/packages/corpora/ 
brown.zip. 

Each sample in the corpus consists of words that have been tagged with their part-of- 
speech after a forward slash. For example, 

The/at football/nn opponent/nn on/in homecoming/nn is/bez ,/, of/in 
course/nn ,/, selected/vbn with/in the/at view/nn that/cs 

Here, The has been tagged as an article (/at), f ootball as a noun (/nn) and so on. A 
full list of the tags is available from the accompanying manual. 9 

Write a program that analyzes the Brown corpus and retums a list of the eight-letter 
words which feature each possible two-letter combinations exactly twice. For example, 
the two-letter combination pc is present in only the words topcoats and upcoming ; mt is 
present only in the words boomtown and undreamt. 


4.4 Operating system Services 

4.4.1 The sys module 

The sys module provides certain system-specific parameters and functions. Many of 
them are of interest only to fairly advanced users of less-common Python implemen- 
tations (the details of how floating point arithmetic is implemented can vary between 


9 This manual is available at www.hit.uib.no/icame/brown/bcm.html though the tags themselves are presented 
better on the Wikipedia article at http://en.wikipedia.org/wiki/Brown_Corpus. 
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different Systems, for example, but is likely to be the same on all common platforms - 
see Section 9.1). However, it also provides some that are useful and important: these are 
described here. 

sys.argv 

sys. argv holds the command line arguments passed to a Python program when it is 
executed. It is a list ofstrings. The first item, sys . argv [ 0 ], is the name of the program 
itself. This allows for a degree of interactivity without having to read from configuration 
files or require direct user input, and means that other programs or shell Scripts can cati 
a Python program and pass it particular input values or settings. For example, a simple 
script to square a given number might be written: 

# square.py 

import sys 

n = int(sys.argv [1]) 
printfn, 'squared is', n**2) 

(Note that it is necessary to convert the input value into an int, because it is stored in 
sys . argv as a string.) Running this program from the command line with 

python square.py 3 

produces the output 

3 squared is 9 

as expected. But because we did not hard-code the value of n, the same program can be 
run with 

python square.py 4 

to produce the output 

4 squared is 16 


sys.exit 

Calling sys. exit will cause a program to terminate and exit from Python. This hap- 
pens “cleanly,” so that any commands specified in a try statement’s f inally clause 
are executed first and any open files are closed. The optional argument to sys.exit 
can be any object; if it is an integer, it is passed to the shell which, it is assumed, knows 
what to do with it. 10 For example, 0 usually denotes “successful” termination of the 
program and nonzero values indicate some kind of error. Passing no argument or None 
is equivalent to 0. If any other object is specified as an argument to sys.exit, it is 
passed to stderr, Python’s implementation of the Standard error stream. A string, for 
example, appears as an error messcige on the console (unless redirected elsewhere by 
the shell). 


10 


At least if it is in the range 0-127; undefined results could be produced for values outside this range. 
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Example E4.13 A common way to help users with Scripts that take command line 
arguments is to issue a usage message if they get it wrong, as in the following code 
example. 

Listing 4.3 Issuing a usage message for a script taking command line arguments 

# square.py 

import sys 


try: 

n = int(sys.argv[1]) 
except (IndexError, ValueError): 

sys.exit('Please enter an integer, <n>, on the command line. \nUsage : ' 

'python {:s} <n>format(sys.argv[0])) 
print(n, 'squared is', n**2) 

The error message here is reported and the program exits if no command line argu- 
ment was specified (and hence indexing sys . argv [1] raises an IndexError) or the 
command line argument string does not evaluate to an integer (in which case the int 
cast will raise a ValueError). 

$ python eg4-usage.py hello 

Please enter an integer, <n>, on the command line. 

Usage: python eg4-usage.py <n> 

$ python eg4-usage.py 5 
5 squared is 25 


4.4.2 The os module 

The os module provides various operating system interfaces in a platform-independent 
way. Its many functions and parameters are described in full in the official documenta- 
tion, 11 but some of the more important are described in this section. 

Process information 

The Python process is the particular instance of the Python application that is executing 
your program (or providing a Python shell for interactive use). The os module provides 
a number of functions for retrieving information about the context in which the 
Python process is running. For example, os .uname () retums information about the 
operating system running Python and the network name of the machine running 
the process. 

One function is of particular use: os . getenv (key) retums the value of the environ- 
ment variable key if it exists (or None of it doesn’t). Many environment variables are 
system specilic, but commonly include 


11 http://docs.python.org/3/library/os.html. 
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• home: the path to the user’s home directory, 

• pwd: the current working directory, 

• user: the current user’s usemame and 

• path: the system path environment variable. 

For example, on my system: 

>>> os.getenv('HOME') 

'/Users/christian' 


File system commands 

It is often useful to be able to navigate the system directory tree and manipulate 
files and directories from within a Python program. The os module provides the 
functions listed in Table 4.4 to do just this. There are, of course, inherent dangers: your 
Python program can do anything that your user can, including renaming and deleting 
files. 

Pathname manipulations 

The os.path module provides a number of useful functions for manipulating path- 
names. The version of this library installed with Python will be the one appropriate for 
the operating system that it runs on (e.g., on a Windows machine, path name compo- 
nents are separated by the backslash character, ‘\’, whereas on Unix and Linux Systems, 
the (forward) slash character, ‘ / ’ is used. 

Common usage of the os .path module’s functions are to find the filename from a 
path (basename), test to see if a file or directory exists (exists), join strings together 
to make a path (j oin), split a filename into a ‘root’ and an ‘extension’ (splitext) and 
to find the time of last modification to a file (getmtime). Such common applications 
are described briefly in Table 4.5. 


Table 4.4 os module: file system commands 


Function 

Description 

os.listdir( path='. ') 

List the entries in the directory given by pa th (or the current 
working directory if this is not specified). 

os.remove (path) 

Delete the file path (raises an OSError if path is a direc¬ 
tory; use os . rmdir instead). 

os.rename( old name, 

Rename the file or directory old name to new name. If a file 

new name) 

with the name new name already exists, it will be overwritten 
(subject to user-permissions). 

os.rmdir (path) 

Delete the directory path. If the directory is not empty, an 
OSError is raised. 

os.mkdir (path) 

Create the directory named path. 

os.system( command) 

Execute command in a subshell. If the command generates any 
output, it is redirected to the interpreter Standard output stream, 
stdout. 
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Table 4.5 os. path module: common pathname manipulations 


Function 


Description 


os.path.basename (path) 


os.path.dirname (path) 
os.path.exists (path) 

os.path.getmtime (path) 
os .path.getsize (path) 
os. path.j oin (pathl, 
path2, . . .) 

os.path.split (path) 


os. path.splitext (path) 


Return the basename of the pathname path giving a 
relative or absolute path to the file: this usually means 
the hlename. 

Return the directory of the pathname path. 

Return True if the directory or hle path exists, and 
False otherwise. 

Return the time of last modification of path. 

Return the size of path in bytes. 

Return a pathname formed by joining the path compo- 
nents pathl, path2, etc. with the directory separator 
appropriate to the operating System being used. 

Split pa th into a directory and a hlename, returned as a 
tuple (equivalent to calling dirname and basename) 
respectively. 

Split path into a ‘root’ and an ‘extension’ (returned as 
a tuple pair). 


Some examples referring to a file /home/brian/test .py: 

>>> os.path.basename( 7 /home/brian/test.py') 

'test.py 7 # Just the filename 

>>> os.path.dirname( 7 /home/brian/test.py') 

7 /home/brian' # Just the directory 

>>> os.path.split( 7 /home/brian/test.py') 

('/home/brian 7 , 'test.py 7 ) # Directory and filename in a tuple 
>>> os.path.splitext( 7 /home/brian/test.py') 

( 7 /home/brian/test', '.py') # File path stem and extension in a tuple 
>>> os.path.join(os.getenv( 7 HOME 7 ), 'test.py 7 ) 

7 /home/brian/test.py 7 # Join directories and/or filename 

>>> os .path.exists( 7 /home/brian/test.py 7 ) 

False # File does not existi 

Trying to call some of these functions on a path that does not exist will cause a 
FileNotFoundError exception to be raised (which could be caught with in a try . . . 
except clause, of course). 


Example E4.14 Suppose you have a directory of data files identified by filenames con- 
taining a date in the form data-DD-Mon -yy . txt where dd is the two-digit day number, 
Mon is the three-letter month abbreviation and yy is the last two digits of the year, for 
example ' 02-Feb-i0'. The following program converts the filenames into the form 
data - yyyy - mm - dd .txt so that an alphanumeric ordering of the filenames puts them 
in chronological order. 
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Listing 4.4 Renaming data files by date 

# eg4-osmodule.py 

import os 
import sys 

months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 

'jul', 'aug', 'sep', 'oct', 'nov', 'dee'] 

dir_name = sys.argv[1] 

for filename in os.listdir(dir_name): 

# filename is expected to be in the form 'data-DD-MMM-YY.txt' 
d, month, y = int(filename[5:7]), filename[8:11], int(filename[12:14]) 
O m = months.index(month.lower())+1 

newname = 'data-20{:02d}-{:02d}-{:02d}.txt'.format(y, m, d) 

newpath = os.path.join(dir_name, newname) 

oldpath = os.path.join(dir_name, filename) 

print(oldpath, '->', newpath) 

os.rename(oldpath, newpath) 


O We get the month number from the index of corresponding abbreviated month name 
in the list months, adding 1 because Python list indexes start at 0. 

For example, given a directory testdir containing the following files: 

data-02-Feb-10.txt 
data-10-Oct-14.txt 
data-22-Jun-04.txt 
data-31-Dec-06.txt 


the command python eg4 -osmodule .py testdir produces the output 

testdir/data-02-Feb-10.txt -> testdir/data-2010-02-02.txt 
testdir/data-10-Oct-14.txt -> testdir/data-2014-10-10.txt 
testdir/data-22-Jun-04.txt -> testdir/data-2004-06-22.txt 
testdir7data-31-Dec-06.txt -> testdir7data-2006-12-31.txt 


See also Problem 4.4.4 and the datetime module (Section 4.5.3). 


4.4.3 Exercises 

Problems 

P4.4.1 Modify the hailstone sequence generator of Exercise P2.5.7 to generate the 
hailstone sequence starting at any positive integer that the user provides on the command 
line (use sys . argv). Handle the case where the user forgets to provide n or provides 
an invalid value for n gracefully. 


P4.4.2 The Haversine formula gives the shortest (great-circle) distance, d, between 
two points on a sphere of radius R from their longitudes (Ap X 2 ) and latitudes (<p\, (pi)’- 


d 


= 2r aresin ^haversin(02 — 4> 1) + cos0i cos^> 2 haversin(A 2 — Ai)j , 
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where the haversine function of an angle is defined by 


haversin(a) = sin 2 . 


Write a program to calculate the shortest distance in km between two points on 
the surface of the Earth (considered as a a sphere of radius 6378.1 km) given as two 
command line arguments, each of which is a comma-separated pair of latitude, longi- 
tude values in degrees. For example, the distance between Paris and Rome is given by 
executing: 

python greatcircle.py 48.9,2.4 41.9,12.5 
1107 km 


P4.4.3 Write a Python program to create a directory, test, in the user’s horne direc- 
tory and to populate it with 20 Scalable Vector Graphics (SVG) files depicting a small, 
filled red circle inside a large, black, unfilled circle. For example, 

<?xml version="l.0" encoding="utf-8"?> 

<svg xmlns="http://www.w3.org/2000/svg" 

xmlns:xlink="http://www.w3.org/1999/xlink" 
width="500" height="500" style="background: #ffffff"> 
ccircle cx="250.0" cy="250.0" r="200" style="stroke: black; stroke-width: 2px; 

fili: none;"/> 

ccircle cx="430.0" cy="250.0" r="20" style="stroke: red; fili: red;"/> 

</svg> 

Each file should move the red circle around the inside rim of the larger circle so that 
the 20 files together could form an animation. 

One way to achieve this is to use the free ImageMagick Software (www.imagemagick. 
org/). Ensure the SVG files are named figOO.svg, figOl.svg, etc. and issue the 
following command from your operating system’s command line: 

convert -delay 5 -loop 0 fig*.svg animation.gif 

to produce an animated GIF image. 

P4.4.4 Modify the program of Example E4.14 to catch the following errors and handle 
them gracefully: 

• User does not provide a directory name on the command line (issue a usage 
message); 

• The directory does not exist; 

• The name of a file in the directory does not have the correct format; 

• The filename is in the correct format but the month abbreviation is not recog- 
nized. 

Your program should terminate in the first two cases and skip the file in the second two. 


4.5 Modules and packages 

As we have seen, Python is quite a modular language and has functionality beyond 
the core programming essentials (the built-in methods and data structures we have 
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encountered so far) that is made available to a program through the import statement. 
This statement makes reference to modules that are ordinary Python files containing 
definitions and statements. Upon encountering the line 


import <module> 


the Python interpreter executes the statements in the file <module> . py and enters the 
module name <module> into the current namespace, so that the attributes it de fines are 
available with the “dotted syntax”: <module>. <attribute>. 

Defining your own module is as simple as placing code with in a file <module >. py, 
which is somewhere the Python interpreter can find it (for small projects, usually just the 
same directory as the program doing the importing). Note that because of the syntax of 
the import statement, you should avoid naming your module anything that isn’t a valid 
Python identifier (see Section 2.2.3). For example, the filename <module>.py should 
not contain a hyphen or start with a digit. Do not give your module the same name as 
any built-in modules (such as math or random) because these get priority when Python 
imports. 

A Python package is simply a structured arrangement of modules within a directory 
on the file system. Packages are the natural way to organize and distribute larger Python 
projects. To make a package, the module files are placed in a directory, along with a file 

named_ init_.py. This file is run when the package is imported and may perform 

some initialization and its own imports. It may be an empty file (zero bytes long) if no 
special initialization is required, but it must exist for the directory to be considered by 
Python to be a package. 

For example, the NumPy package (see Chapter 6) exists as the following directory 
(some files and directories have been omitted for clarity): 


numpy/ 

_init_.py 

core/ 

fft/ 

_init_.py 

fftpack.py 
info.py 

linalg/ 

_init__.py 

linalg.py 
info.py 

polynomial/ 

_init_.py 

chebyshev.py 
hermite.py 
legendre.py 

random/ 

version.py 
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Table 4.6 Python modules and packages 


Module / Package Description 


os, sys 
math, cmath 
random 
collectioris 

itertools 

glob 

datetime 
fractions 
re 

argparse 

urllib 

* Django (dj ango) 

* pyparsing 
pdb 

logging 
xml, lxml 

* VPython (vi sua 1) 
unittest 

* NumPy (numpy) 

* SciPy (scipy) 

* matplotlib, pylab 

* SymPy (sympy) 

* pandas 

* scikit-learn 

* Beautiful Soup 
(beautifulsoup) 


Operating System Services, as described in Section 4.4 
Mathematical functions, as introduced in Section 2.2.2 
Random number generator (see Section 4.5.1) 

Data types for containers that extend the functionality of dictio- 
naries, tuples, etc. 

Tools for efficient iterators that extend the functionality of simple 
Python loops 

Unix-style pathname pattern expansion 

Parsing and manipulating dates and times (see Section 4.5.3) 

Rational number arithmetic 

Regular expressions 

Parser for command line options and arguments 

URL (including web pages) opening, reading and parsing (see 

Section 4.5.2) 

A popular web application framework 
Lexical parser for simple grammars 
The Python debugger 
Python’s built-in logging module 
XML parsers 

Three-dimensional visualization 

Unit testing framework for systematically testing and validating 
individual units of code (see Section 9.3.4) 

Numerical and scientihc computing (described in detail in 
Chapter 6) 

Scientihc computing algorithms (described in detail in Chapter 8) 
Plotting (see Chapters 3 and 7) 

Symbolic computation (computer algebra) 

Data manipulation and analysis with table-like data structures 
Machine leaming 

HTML parser, with handling of malformed documents 


Thus, for example, polynomial is a subpackage of the numpy package containing 
several modules, including legendre, which may be imported as 

import numpy.polynomial.legendre 

To avoid having to use this full dotted syntax in actually referring to its attributes, it 
is convenient to use 

from numpy.polynomial import legendre 

Table 4.6 lists some of the major, freely available Python modules and packages 
for general programming applications as well as for numerical and scientihc work. 
Some are installed with the core Python distribution (the Standard Library ); 12 where 


12 A complete list of the components of the Standard Library is at https://docs.python.Org/3/library/index. 
html. 
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indicated; others can be downloaded and installed separately. Before implementing your 
own algorithm, check that it isn’t included in an existing Python package. 


4.5.1 The random module 

For simulations, modeling and some numerical algorithms it is often necessary to gen¬ 
erate random numbers from some distribution. The topic of random-number generation 
is a complex and interesting one, but the important aspect for our purposes is that, 
in common with most other languages, Python implements a pseudorandom number 
generator (PRNG). This is an algorithm that generates a sequence of numbers that 
approximates the properties of “truly” random numbers. Such sequences are determined 
by an originating seed state and are always the same following the same seed: in this 
sense they are deterministic. This can be a good thing (so that a calculation involving 
random numbers can be reproduced) or a bad thing (e.g., if used for cryptography, where 
the random sequence must be kept secret). Any PRNG will yield a sequence that even- 
tually repeats, and a good generator will have a long period. The PRNG implemented 
by Python is the Mersenne Twister, a well-respected and much-studied algorithm with 
a period of 2 19937 — 1 (a number with more than 6,000 digits in base 10). 


Generating random numbers 

The random number generator can be seeded with any hashable object (e.g., an 
immutable object such as an integer). When the module is first imported, it is seeded 
with a representation of the current system time (unless the operating system provides 
a better source of a random seed). The PRNG can be reseeded at any time with a call to 
random. seed. 

The basic random number method is random. random. It generates a random number 
selected from the uniform distribution in the semi-open interval [0,1) - that is, including 
0 but not including 1. 

>>> import random 

> > > random.random() 

0.5204514767709216 
>>> random.seed(42) 

> > > random.random() 

0.6394267984578837 

> > > random.random() 

0.025010755222666936 

>>> random.seed(42) 

> > > random.random() 

0.6394267984578837 

> > > random.random() 

0.025010755222666936 

Calling random. seed () with no argument reseeds the PRNG with a ‘random’ value 
as when the random module is first imported. 


# PRNG seeded 'randomly' 

# Seed the PRNG with a fixed value 

# Reseed with the same value as before ... 

# ... and the sequence repeats. 
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To select a random floating point number, N, from a given range, a < N < b, use 

random.uniform(a, b) : 

>>> random.uniform(-2., 2.) 

-0.899882726523523 

>>> random.uniform(-2., 2.) 

-1.107157047404709 

The random module has several methods for drawing random numbers from nonuni- 
form distributions - see the documentation 13 - here we mention the most important of 
them. 

To return a number from the normal distribution with mean mu and Standard deviation 
sigma, use random.normalvariate(mu, sigma) : 

>>> random.normalvariate(100, 15) 

118.82178896586194 

>>> random.normalvariate(100, 15) 

97.92911405885782 

To select a random integer, N, in a given range, a < N < b, use random. randint 
(a, b) method: 

>>> random.randint(5, 10) 

7 

>>> random.randint(5, 10) 

10 


Random sequences 

Sometimes you may wish to select an item at random from a sequence such as a list. 
This is what the method random. choice does: 

>>> seq = [10, 5, 2, 'ni', -3.4] 

>>> random.choice(seq) 

-3.4 

>>> random.choice(seq) 

' ni' 

Another method, random. shuf f le, randomly shuffles (permutes) the items of the 
sequence in place : 

>>> random.shuffle(seq) 

>>> seq 

[10, -3.4, 2, 'ni', 5] 

Note that because the random permutation is made in place, the sequence must be 
mutable: you can’t, for example, shuffle tuples. 

Finally, to draw a list of k unique elements from a sequence or set (without replace- 
ment) population, there is random. sample (population, k): 

>>> raffle_numbers = range(1, 100001) 

>>> winners = random.sample(raffle_numbers, 5) 

>>> winners 

[89734, 42505, 7332, 30022, 4208] 


13 https://docs.python.org/3/library/random.html. 
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The resulting list is in selection order (the first-indexed element is the first drawn) so 
that one could, for example, without bias declare ticket number 89734 to be the jackpot 
winner and the remaining four tickets second-placed winners. 


Example E4.15 The Monty Hcill problem is a famous conundrum in probability which 
takes the form of a hypothetical game show. The contestant is presented with three 
doors; behind one is a car and behind each of the other two is a goat. The contestant 
picks a door and then the game show host opens a different door to reveal a goat. The 
host knows which door conceals the car. The contestant is then invited to switch to the 
other closed door or stick with his or her initial choice. 

Counterintuitively, the best strategy for winning the car is to switch, as demonstrated 
by the following simulation. 

Listing 4.5 The Monty Hali problem 

# eg4 - mon tyha 11. py 
import random 

def run_trial(switch_doors, ndoors=3): 

Run a single trial of the Monty Hali problem, with or without switching 
after the game show host reveals a goat behind one of the unchosen doors. 

(switch_doors is True or False). The car is behind door number 1 and the 
game show host knows that. Returns True for a win, otherwise returns False. 


# Pick a random door out of the ndoors available 
chosen_door = random.randint(1, ndoors) 

if switch_doors: 

# Reveal a goat 

revealed_door = 3 if chosen_door==2 else 2 

# Make the switch by choosing any other door than the initially 

# selected one and the one just opened to reveal a goat. 
available_doors = [dnum for dnum in range(1,ndoors+1) 

if dnum not in (chosen_door, revealed_door)] 
chosen_door = random.choice(available_doors) 

# You win if you picked door number 1 

O return chosen_door == 1 

def run_trials(ntrials, switch_doors, ndoors=3): 

Run ntrials iterations of the Monty Hali problem with ndoors doors, with 
and without switching (switch_doors = True or False). Returns the number 
of trials which resulted in winning the car by picking door number 1. 


nwins = 0 

for i in range(ntrials): 

if run_trial(switch_doors, ndoors): 
nwins += 1 
return nwins 
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ndoors, ntrials = 3, 10000 

nwins_without_switch = run_trials(ntrials, False, ndoors) 
nwins_with_switch = run_trials(ntrials, True, ndoors) 

print ('Monty Hali Problem with {} doorsformat(ndoors)) 
print (' Proportion of wins without switching: {:.4f}' 

.format(nwins_without_switch/ntrials)) 
print (' Proportion of wins with switching: { : .4f}' 

.format(nwins_with_switch/ntrials)) 


O Without loss of generality, we can place the car behind door number 1, leaving the 
contestant initially to choose any door at random. 

To make the code a little more interesting, we have allowed for a variable number of 
doors in the simulation (but only one car). 

Monty Hali Problem with 3 doors 
Proportion of wins without switching: 0.3334 
Proportion of wins with switching: 0.6737 


4.5.2 0 The urllib package 

The urllib package in Python 3 is a set of modules for opening and retrieving the con- 
tent referred to by uniform resource locators (URLs), typically web addresses accessed 
with HTTP(S) or FTP. Here we give a very brief introduction to its use. 

Opening and reading URLs 

To obtain the content at a URL using HTTP you first need to make an HTTP request by 
creating a Request object. For example, 

import urllib.request 

req = urllib.request.Request('http://www.wikipedia.org') 

The Request object allows you to pass data (using GET or POST) and other informa- 
tion about the request (metadata passed through the HTTP headers - see later). For a 
simple request, however, one can simply open the URL immediately as a file-like object 
with urlopen (): 

response = urllib.request.urlopen(req) 

It s a good idea to catch the two main types of exception that can arise from this 
statement. The first type, URLError, results if the server doesn’t exist or if there is 
no network connection; the second type, HTTPError, occurs when the server returns 
an error code (such as 404: Page Not Found). These exceptions are defined in the 
urllib. error module. 

from urllib.error import URLError, HTTPError 

try: 

response = urllib.request.urlopen(req) 
except HTTPError as e: 

print('The server returned error code', e.code) 
except URLError as e: 
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print ('Failed to reach server at {} for the following reason:\n{}' 

.format(uri, e.reason)) 

else: 

# the response came back OK 

Assuming the urlopen () worked, there is often nothing more to do than simply read 
the content from the response: 

content = response.read() 


The content will be retumed as a bytestring. To decode it into a Python (Unicode) string 
you need to know how it is encoded. A good resource will include the character set used 
in the Content-Type HTTP header. This can be used as follows: 

charset = response.headers.get_content_charset() 
html = content.decode(charset) 

where html is now a decoded Python Unicode string. If no character set is specified in 
the headers returned, you may have to guess (e.g., set charset=' utf - 8' )• 

GET and POST requests 

It is often necessary to pass data along with the URL to retrieve content from a server. 
For example, when submitting an HTML form from a web page, the values correspond- 
ing to the entries you have made are encoded and passed to the server according to either 
the GET or POST protocols. 

The uri 1 ib. parse module allows you to encode data from a Python dictionary into 
a form suitable for submission to a web server. To take an example from the Wikipedia 
API using a GET request: 


>>> uri = 'http://wikipedia.Org/w/api.php' 

>>> data = {'page': 'Monty_Python', 'prop': 'text', 'action': 'parse', 'section': 0} 
>>> encoded_data = urllib.parse.urlencode(data) 

>>> full_url = uri + '?' + encoded_data 
>>> full_url 

'http://wikipedia.org/w/api.php?page=Monty_Python&prop=text&action=parse 
&section=0' 

>>> req = urllib.request.Request(full_url) 

>>> response = urllib.request.urlopen(req) 

>>> html = response.read().decode('utf-8') 

To make a POST request, instead of appending the encoded datato the string <url>?, 
pass it to the Request constructor directly: 


req = urllib.request.Request(uri, encoded_data) 


4.5.3 The datetime module 

Python’s datetime module provides classes for manipulating dates and times. 
There are many subtle issues surrounding the handling of such data (time zones, 
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different calendars, Daylight Saving Time etc.,) and full documentation is available 
online; 14 here we provide an overview of only the most common uses. 

Dates 

A datetime . date object represents a particular day, month and year in an idealized 
calendar (the current Gregorian calendar is assumed to be in existence for all dates, past 
and future). To create a date object, pass valid year, month and day numbers explicitly, 
or cati the date. today constructor: 

>>> from datetime import date 

>>> birthday = date(2004, 11, 5) # OK 

>>> notadate = date(2005, 2, 29) # Oops: 2005 wasn't a leap year 


Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 
ValueError: day is out of range for month 


>>> today = date.today() 

>>> today 

datetime.date(2014, 12, 6) # (for example) 

Dates between 1/1/1 and 31/12/9999 are accepted. Parsing dates to and from strings is 
also supported (see strptime and strf time). 

Some more useful date object methods: 

>>> birthday.isoformat() # ISO 8601 format: YYYY-MM-DD 

' 2004-11-05' 

>>> birthday.weekday() # Monday = 0, Tuesday = 1, . Sunday = 6 

4 # (Friday) 

>>> birthday.isoweekday() # Monday = 1, Tuesday =2, Sunday = 7 

5 


>>> birthday.ctime() # C-Standard time output 

'Fri Nov 5 00:00:00 2004' 

dates can also can be compared (chronologically): 

>>> birthday < today 
True 


>>> today == birthday 
False 


Times 

A datetime .time object represents a (local) time of day to the nearest microsecond. 
To create a time object, pass the number of hours, minutes, seconds and microseconds 
(in that order; missing values default to zero). 


14 https://docs.python.org/3/library/datetime.html. 
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>>> from datetime import time 

>>> lunchtime = time(hour=13, minute=30) 

>>> lunchtime 
datetime.time(13, 30) 

>>> lunchtime.isoformat() # ISO 8601 format: HH:MM:SS if no microseconds 

'13:30:00' 

>>> precise_time = time(4,46,36,501982) 

>>> precise_time.isoformat() # ISO 8601 format: HH:MM:SS.mmmmmm 

'04:46:36.501982' 


>>> witching_hour = time(24) # Oops: hour must satisfy 0 <= hour < 24 

Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

ValueError: hour must be in 0..23 


datetime objects 

A datetime . datetime object contains the information from both the date and time 
objects: year, month, day, hour, minute, second, microsecond. As well as passing 
values for these quantities directly to the datetime constructor, the methods today 
(retuming the current date) and now (retuming the current date and time) are available: 

>>> from datetime import datetime # (a notoriously ugly import) 

>>> now = datetime.now() 

>>> now 

datetime.datetime(2014, 12, 6, 12, 4, 51, 763063) 

>>> now.isoformat() 

'2014-12-06T12:04:51.763063' 


>>> now.ctime() 

'Sat Dec 6 12:04:51 2014 


Date and time formatting 

date, time and datetime objects support a method, strf time to output their values 
as a string formatted according to a syntax set using the format specifiers listed in Table 
4.7. 


>>> birthday.strftime('%A, %d %B %Y') 

'Friday, 05 November 2004' 

>>> now.strftime('%I:%M:%S on %d/%m/%y') 

'12:04:51 on 06/12/14' 

The reverse process, parsing a string into a datetime object is the purpose of the 
strptime method: 

>>> launch_time = datetime.strptime('09:32:00 July 16, 1969', 

'%H:%M:%S %B %d, %Y') 

>>> print (launch_time) 

1969-07-16 09:32:00 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:20, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://d 0 i. 0 rg/l 0.1017/CB09781139871754.004 


4.6 An introduction to object-oriented programming 


147 


Table 4.7 strf time and strptime format specifiers. Note that many of these are 
locale-dependent (e.g., on a German-language system, %a will yield Sonntag, 
Montag, etc.). 


Specifier Description 


%a Abbreviated weekday (Sun, Mon, etc.) 

%A Full weekday (Sunday, Monday, etc.) 

%w Weekday number (0=Sunday, l=Monday,..., 6=Saturday). 

%d Zero-padded day of month: 01, 02, 03, ..., 31. 

%b Abbreviated month name (Jan, Feb, etc.) 

%B Full month name (January, February, etc.) 

%m Zero-padded month number: 01, 02, ..., 12. 

%y Year without century (two-digit, zero-padded): 01, 02,..., 99. 

%Y Year with century (four-digit, zero-padded): 0001, 0002,... 9999. 

%H 24-hour clock hour, zero-padded: 00, 01, ..., 23. 

%I 12-hour clock hour, zero-padded: 00, 01, ..., 12. 

%p AM or PM (or locale equivalent). 

%M Minutes (two-digit, zero-padded): 00, 01,..., 59. 

%S Seconds (two-digit, zero-padded): 00, 01,..., 59. 

%f Microseconds (six-digit, zero-padded): 000000, 000001,..., 999999. 

%% The literal % sign. 


>>> print (launch_time.strftime('%I:%M %p on %A, %d %b %Y') ) 
09:32 AM on Wednesday, 16 Jul 1969 


4.6 0 An introduction to object-oriented programming 

4.6.1 Object-oriented programming basies 

Structured programming styles may be broadly divided into two categories: procedural 
and object-oriented. The programs we have looked at so far in this book have been 
procedural in nature: we have written functions (of the sort that would be called proce- 
dures or subroutines in other languages) that are called, passed data, and which return 
values from their calculations. The functions we have defined do not hold their own 
data or remember their state in between being called, and we haven’t modified them 
after defining them. 

An alternative programming paradigm that has gained popularity through the use 
of languages such as C++ and Java is object-oriented programming. In this context, 
an object represents a concept of some sort which holds data about itself ( attributes ) 
and delines functions ( methods ) for manipulating data. That manipulation may cause a 
change in the object’s state (i.e., it may change some of the object’s attributes). An object 
is created ( instantiated) from a “blueprint” called a class, which dictates its behavior by 
defining its attributes and methods. 

In fact, as we have already pointed out, everything in Python is an object. So, for 
example, a Python string is an instance of the str class. A str object possesses its 
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BankAccount 


Customer 


account_number 


name 


balance 


address 
date_of_birth 
password 
get_age() 
change_password() 


customer 
deposit(amount) 
withdraw(amount) 


Figure 4.2 Basic classes representing a bank account and a customer. 

own data (the sequence of characters making up the string) and provides ( “exposes ”) a 
number of methods for manipulating that data. For example, the capitalize method 
returns a new string object created from the original string by capitalizing its first letter; 
the split method returns a list of strings by splitting up the original string: 

>>> a = 'hello, aloha, goodbye, aloha' 

>>> a.capitalize() 

'Helio, aloha, goodbye, aloha' 

>>> a.split(',') 

['hello', ' aloha', ' goodbye', ' aloha'] 

Even indexing a string is really to call the method_getitem_: 

>>> b = [10, 20, 30, 40, 50] 

>>> b._getitem_(4) 

50 

That is, a [4] is equivalent to a._getitem_(4) , 15 

Part of the popularity of object-oriented programming, at least for larger projects, 
stems from the way it helps conceptualize the problem that a program aims to solve. 
It is often possible to break a problem down into units of data and operations that it 
is appropriate to carry out on that data. For example, a retail bank deals with people 
who have bank accounts. A natural object-oriented approach to managing a bank would 
be to deline a BankAccount class, with attributes such as an account number, balance 
and owner, and a second, Customer class with attributes such as a name, address, and 
date of birth. The BankAccount class might have methods for allowing (or forbidding) 
transactions depending on its balance and the Customer class might have methods for 
calculating the customer’s age from their date of birth for example (see Figure 4.2). 

An important aspect of object-oriented programming is inheritance. There is often a 
relationship between objects which takes the form of a hierarchy. Typically, a general 
type of object is defined by a base class, and then customized classes with more special- 
ized functionality are derived from it. In our bank example, there may be different kinds 
of bank accounts: savings accounts, current (checking) accounts, etc. Each is derived 
from a generic base bank account, which might simply deline basic attributes such as 
a balance and an account number. The more specialized bank account classes inherit 


15 The double-underscore syntax usually denotes a name with some special meaning to Python. 
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Figure 4.3 Two classes derived from an abstract base class: SavingsAccount and 
CurrentAccount inherit methods and attributes from BankAccount but also customize 
and extend its functionality. 


the properties of the base class but may also customize them by overriding (redefining) 
one or more methods and may also add their own attributes and methods. This helps 
structure the program and encourages cade reuse - there is no need to declare an account 
number separately for both savings and current accounts because both classes inherit 
one automatically from the base class. If a base class is not to be instantiated itself, but 
serves only as a template for the derived classes, it is called an abstract class. 

In Figure 4.3, the relationship between the base class and two derived subclasses is 
depicted. The base class, BaseAccount, delines some attributes (account_number, 
balance and customer) and methods (such as deposit and withdraw) common 
to ali types of account, and these are inherited by the subclasses. The subclass 
SavingsAccount adds an attribute and a method for handling interest payments on 
the account; the subclass CurrentAccount instead adds two attributes describing the 
annual account fee and transaction withdrawal limit, and overrides the base withdraw 
method, perhaps to check that the transaction limit has not been reached before a 
withdrawal is allowed. 


4.6.2 Defining and using classes in Python 

A class is defined using the class keyword and indenting the body of statements 
(attributes and methods) in a block following this declaration. It is conventional to give 
classes names written in CamelCase. It is a good idea to follow the class statement 
with a docstring describing what it is that the class does (see Section 2.7.1). Class 
methods are defined using the familiar def keyword, but the first argument to each 
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method should be a variable named self 16 - this name is used to refer to the object 
itself when it wants to call its own methods or refer to attributes, as we shall see. 

In our example of a bank account, the base class could be defined as follows: 

Listing 4.6 The definitiori of the abstract base class, BankAccount 

# bank_account.py 
class BankAccount: 

""" A abstract base class representing a bank account.""" 
currency = '$' 


def _ init _(self, customer, account_number, balance=0): 

m ii n 

Initialize the BankAccount class with a customer, account number 
and opening balance (which defaults to 0.) 


self.customer = customer 

self.account_number = account_number 

self.balance = balance 

def deposit(self, amount): 

""" Deposit amount into the bank account.""" 
if amount > 0: 

self.balance += amount 

else: 

print('Invalid deposit amount:', amount) 
def withdraw(self, amount): 

n n n 

Withdraw amount from the bank account, ensuring there are sufficient 
funds. 


if amount > 0: 

if amount > self.balance: 

print ('Insufficient funds ') 

else: 

self.balance -= amount 

else: 

print ('Invalid withdrawal amount:', amount) 


To use this simple class, we can save the code defining it as bank_account. py and 
import it into a new program or the interactive Python shell with 

from bank_account import BankAccount 

This new program can now create BankAccount objects and manipulate them by call- 
ing the methods described earlier. 


16 Actually, it could be named anything, but self is almost universally used. 
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Instantiating the object 

An instance of a class is created with the syntax object = ClassName ( args ) . You 
may want to require that an object instantiated from a class should initialize itself in 
some way (perhaps by setting attributes with appropriate values) - such initialization is 

carried out by the special method_ init _which receives any arguments, args, 

specified in this statement. 

In our example, an account is opened by creating a BankAccount object, passing the 
name of the account owner (customer), an account number and, optionally, an opening 
balance (which defaults to 0 if not provided): 

my_account = BankAccount('Joe Bloggs', 21457288) 

We will replace the string customer with a Customer object in Example E4.16. 

Methods and attributes 

The class defines two methods: one for depositing a (positive) amount of money and 
one for withdrawing money (if the amount to be withdrawn is both positive and not 
greater than the account balance). 

The BankAccount class possesses two different kinds of attribute: self. 
customer, self . account_number and self .balance are instance variables'. they 
can take different values for different objects created from the BankAccount class. 
Conversely, the variable currency is a class variable : this variable is defined inside 
the class but outside any of its methods and is shared by ali instances of the class. 

Both attributes and methods are accessed using the object .attr notation. For 
example, 

>>> my_account.account_number # access an attribute of my_account 

21457288 

>>> my_account.deposit(64) # call a method of my_account 

>>> my_account.balance 
64 


Let’s add a third method, for printing the balance of the account. This must be defined 
inside the class block: 

def check_balance(self): 

nnn p r i n t a statement of the account balance. """ 
print('The balance of account number {:d} is {:s}{:f.2}' 

.format(self.account_number, self.currency, self.balance)) 


Example E4.16 We now define the Customer class described in class diagram of 
Figure 4.2: an instance of this class will become the customer attribute of the 
BankAccount class. Note that it was possible to instantiate a BankAccount object 
by passing a string literal as customer. This is a consequence of Python’s dynamic 
typing: no check is automatically made that the object passed as an argument to the 
class constructor is of any particular type. 
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The following code defines a Customer class and should be saved to a file called 
customer.py: 

from datetime import datetime 
class Customer: 

nnn ^ class representing a bank customer. """ 

def _ init _(self, name, address, date_of_birth): 

self.name = name 
self.address = address 

self.date_of_birth = datetime.strptime(date_of_birth, / %Y-%m-%d / ) 
self.password = '1234' 

def get_age(self): 

nnn calculates and returns the customer's age. """ 
today = datetime.today() 

try: 

birthday = self.date_of_birth.replace(year=today.year) 
except ValueError: 

# birthday is 29 Feb but today's year is not a leap year 
birthday = self.date_of_birth.replace(year=today.year, 

day=self.date_of_birth.day - 1) 

if birthday > today: 

return today.year - self.date_of_birth.year - 1 
return today.year - self.date_of_birth.year 

Then we can pass Customer objects to our BankAccount constructor: 

>>> from bank_account import BankAccount 

>>> from customer import Customer 

>>> 

>>> customerl = Customer('Helen Smith', '76 The Warren, Blandings, Sussex', 

'1976-02-29') 

>>> accountl = BankAccount(customerl, 21457288, 1000) 

>>> accountl.customer.get_age() 

39 

>>> print(accountl.customer.address) 

76 The Warren, Blandings, Sussex 


4.6.3 Class inheritance in Python 

A subclass may be derived from one or more other base classes with the syntax: 

class SubClass(BaseClassl , BaseClass2 , ...): 

We will now define the two derived classes (or subclasses ) illustrated in Figure 4.3 
from the base BankAccount class. They can be defined in the same file that defines 
BankAccount or in a different Python file which imports BankAccount. 

class SavingsAccount(BankAccount): 

""" A class representing a savings account. """ 

def _ init _(self, customer, account_number, interest_rate, balance=0): 

""" Initialize the savings account. """ 

O self.interest rate = interest rate 
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© super()._ init _(customer, account_number, balance) 

def add_interest(self): 

""" Add interest to the account at the rate self.interest_rate. """ 
self.balance *= (1. + self.interest_rate / 100) 

O The SavingsAccount class adds a new attribute, interestrate, and a new 

method, add interest to its base class, and overrides the_ init _method to 

allow interest_rate to be set when a SavingsAccount is instantiated. 

© Note that the new_ init _method calls the base class’s_ init _method 

in order to set the other attributes: the built-in function super allows us to refer to the 
parent base class . 17 Our new SavingsAccount might be used as follows: 

>>> my_savings = SavingsAccount( 7 Matthew Walsh 7 , 41522887, 5.5, 1000) 

>>> my_savings.check_balance() 

The balance of account number 41522887 is $1000 
>>> my_savings.add_interest() 

>>> my_savings.check_balance() 

The balance of account number 41522887 is $1055.00 

The second subclass, CurrentAccount, has a similar structure: 

class CurrentAccount(BankAccount): 

n n n class representing a current (checking) account. """ 

def _ init _(self, customer, account_number, annual_fee, 

transaction_limit, balance=0): 

""" Initialize the current account. """ 

self.annual_fee = annual_fee 

self.transaction_limit = transaction_limit 

super()._ init _(customer, account_number, balance) 

def withdraw(self, amount): 

n n n 

Withdraw amount if sufficient funds exist in the account and amount 
is less than the single transaction limit. 


if amount <= 0: 

print ( 7 Invalid withdrawal amount: 7 , amount) 

return 

if amount > self.balance: 

print ('Insufficient funds 7 ) 

return 

if amount > self.transaction_limit: 

print ( 7 {0:s}{1:.2f} exceeds the single transaction limit of 7 
7 {0:s}{2:.2f} 7 .format(self.currency, amount, 

self.transaction_limit)) 

return 


17 The built-in function super () called in this way creates a “proxy” object that delegates method calls to 
the parent class (in this case, BankAccount). 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:20, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.101 7/CB09781 1 39871 754.004 



154 


The core Python language II 


self.balance -= amount 
def apply_annual_fee(self): 

""" Deduct the annual fee from the account balance. """ 
self.balance = max(0., self.balance - self.annual_fee) 

Note what happens if we call withdraw on a CurrentAccount object: 

>>> my_current = CurrentAccount('Alison Wicks', 78300991, 20., 200.) 

>>> my_current.withdraw(220) 

Insufficient Funds 

>>> my_current.deposit(750) 

>>> my_current.check_balance() 

The balance of account number 78300991 is $750.00 
>>> my_current.withdraw(220) 

$220.00 exceeds the transaction limit of $200.00 


The withdraw method called is that of the CurrentAccount class, as this method 
overrides that of the same name in the base class, BankAccount. 


Example E4.17 A simple model of a polymer in solution treats it as a sequence of 
randomly oriented segments; that is, one for which there is no correlation between the 
orientation of one segment and any other (this is the so-called rcindom-flight model). 

We will deline a class, Polymer, to describe such a polymer, in which the segment 
positions are held in a list of (x, y , z) tuples. A Polymer object will be initialized with 
the values n and a, the number of segments and the segment length respectively. The 
initialization method calls a make_polymer method to populate the segment positions 
list. 

The Polymer object will also calculate the end-to-end distance, R, and will imple- 
ment a method calc_Rg to calculate and return the polymer’s radius of gyration, 
defined as 



\| (=1 


Listing 4.7 Polymer class 

# polymer .py 

import math 
import random 

class Polymer: 

nnn £ class representing a random-flight polymer in solution. 
def _ init _(self, N, a): 


n n n 


Initialize a Polymer object with N segments , each of length a. 
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self.N, self.a = N, a 

# self.xyz holds the segment position vectors as tuples 
self.xyz = [(None, None, None)] * N 

# End-to-end vector 
self.R = None 

# Make our polymer by assigning segment positions 
self.make_polymer() 

def make_polymer(self): 

n tt n 

Calculate the segment positions, center of mass and end-to-end 
distance for a random-flight polymer. 


# Start our polymer off at the origin, (0,0,0). 
self.xyz[0] = x, y, z = cx, cy, cz = 0., 0., 0. 
for i in range(l, self.N): 

O # Pick a random orientation for the next segment. 

theta = math.acos(2 * random.random() - 1) 
phi = random.random() * 2. * math.pi 

# Add on the corresponding displacement vector for this segment. 
x += self.a * math.sin(theta) * math.cos(phi) 

y += self.a * math.sin(theta) * math.sin(phi) 
z += self.a * math.cos(theta) 

# Store it, and update or center of mass sum. 
self.xyz[i] = x, y, z 

cx, cy, cz = cx + x, cy + y, cz + z 
© # Calculate the position of the center of mass. 

cx, cy, cz = cx / self.N, cy / self.N, cz / self.N 

# The end-to-end vector is the position of the last 

# segment, since we started at the origin. 
self.R = x, y, z 

# Finally, re-center our polymer on the center of mass. 
for i in range(self.N): 

self.xyz[i] = self.xyz[i] [0]-cx,self.xyz[i] [1]-cy,self.xyz[i] [2]-cz 
def calc_Rg(self): 

tt n n 

Calculates and returns the radius of gyrat ion, Rg. The polymer 
segment positions are already given relative to the center of 
mass, so this is just the rms position of the segments. 


self.Rg = 0. 

for x,y,z in self.xyz: 

self.Rg += x**2 + y**2 + z**2 
self.Rg = math.sqrt(self.Rg / self.N) 
return self.Rg 


O One way to pick the location of the next segment is to pick a random point on the 
surface of the unit sphere and use the corresponding pair of angles in the spherical polar 
coordinate system, 6 and <p (where 0 < 0 < tt and 0 < cj> < 2n ) to set the displacement 
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from the previous segment’s position as 


A x = a sin 0 cos 4> 
A y = a sin 6 sin 4> 
A z = a cos 6 


© We calculate the position of the polymer’s center of mass, rcM, and then shift the 
origin of the polymer’s segment coordinates so that they are measured relative to this 
point (that is, the segment coordinates have their origin at the polymer center of mass). 
We can test the Polymer class by importing it in the Python shell: 

>>> from polymer import Polymer 

>>> polymer = Polymer(1000, 0.5) # A polymer with 1000 segments of length 0.5 

>>> polymer.R # End-to-end vector 

(5.631332375722011, 9.408046667059947, -1.3047608473668109) 

>>> polymer.calc_Rg() # Radius of gyration 

5.183761585363432 

Let’s now compare the distribition of the end-to-end distances with the theoretically 
predicted probability density function: 



where the mean square position of the segments is (r 2 ) = Ner 
Listing 4.8 The distributiori of random flight polymers 


# eg4-c-ii-polymer-a .py 

# Compare the observed distribution of end-to-end distances for Np random- 

# flight polymers with the predicted probability distribution function. 

import pylab 

from polymer import Polymer 
pi = pylab.pi 

# Calculate R for Np polymers 
Np = 3000 

# Each polymer consists of N segments of length a 
N, a = 1000, 1. 

R = [None] * Np 
for i in range(Np): 

polymer = Polymer(N, a) 

Rx, Ry, Rz = polymer.R 

R[i] = pylab.sqrt(Rx**2 + Ry**2 + Rz**2) 

# Output a progress indicator every 100 polymers 
if not (i+1) % 100: 


print(i+l, '/'# Np) 


# Plot the distribution of Rx as a normalized histogram 

# using 50 bins 

pylab.hist(R, 50, normed=l) 

# Plot the theoretical probability distribution, Pr, as a function of r 
r = pylab.linspace(0,200,1000) 
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Figure 4.4 Distributiori of the end-to-end distances, R, of random flight-polymers with 
N — l,000,ti = 1. 


msr = N * a**2 

Pr = 4.*pi*r**2 * (2 * pi * msr / 3)**-1.5 * pylab.exp(-3*r**2 / 2 / msr) 

pylab.plot(r, Pr, lw=2, c='r') 

pylab.xlabel('R ') 

pylab.ylabel('P(R)') 

pylab.show() 


The earlier mentioned program produces a plot that typically looks like Figure 4.4, 
suggesting agreement with theory. 


4.6.4 Exercises 

Problems 

P4.6.1 a. Modify the base BankAccount class to verify that the account number 

passed to its_ init _constructor conforms to the Luhn algorithm described in 

Exercise P2.5.3. 

b. Modify the CurrentAccount class to implement a free overdraft. The limit 

should be set in the_ init _constructor; withdrawals should be allowed to 

within the limit. 

P4.6.2 Add a method, save_svg to the Polymer class of Example E4.17 to save an 
image of its polymer as an SVG file. Refer to Exercise P4.4.3 for a template of an SVG 
file. 

P4.6.3 Write a program to create an image of a constellation using the data from the 
Yale Bright Star Catalog (http://tdc-www.harvard.edu/catalogs/bsc5.html). 
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Create a class, star, to represent a star with attributes for its name, magnitude and 
position in the sky, parsed from the file bsc5. dat which forms part of the catalog. 
Implement a method for this class which converts the star’s position on the celestial 
sphere as (Right Ascension: a, Declination: 8) to a point in a plane, (x,y), for example 
using the orthographic projection about a Central point (ao, <$o): 

Aa = a — ao 
x = cos 8 sin Aa 

y = sin 8 cos 8q — cos 8 cos Aa sin Sq 

Suitably scaled projected, star positions can be output to an SVG image as circles 
(with a larger radius for brighter stars). For example, the line 

ccircle cx="200" cy="150" r="5" stroke="none" fill="#ffffff"/> 

represents a white circle of radius 5 pixels, center on the canvas at (200, 150). 

Hint: you will need to convert the right ascension from (hr, min, sec) and the decli¬ 
nation from (deg, min, sec) to radians. Use the data corresponding to “equinox J2000, 
epoch 2000.0” in each line of bsc5 . dat. Let the user select the constellation from the 
command line using its three-letter abbreviation (e.g., ‘Ori’ for Orion): this is given as 
part of the star name in the catalog. Don’t forget that star magnitudes are smaller for 
brighter stars. If you are using the orthographic projection suggested, choose (ao, <?>o) to 
be the mean of (a, 5) for stars in the constellation. 

P4.6.4 Design and implement a class, Experiment, to read in and store a simple 
series of (x,y) data as pylab (i.e., NumPy) arrays from a text file. Include in your 
class methods for transforming the data series by some simple function (e.g., x' = 
lnx, y' = 1 /y) and to perform a linear leastsquares regression on the transformed 
data (returning the gradient and intercept of the best-fit line, / fit = mx + c ). NumPy 
provides methods for performing linear regression (see Section 6.5.3), but for this exer- 
cise the following equations can be implemented directly: 

xy - xy 

m = = -— 

x 2 — x 2 

c = y — mx 

where the bar notation, ”, denotes the arithmetic mean of the quantity under it. (Hint: 
use pylab .mean (arr) to return the mean of array arr.) 

Chloroacetic acid is an important compound in the synthetic production of phamaceu- 
ticals, pesticides and fuels. At high concentration under strong alkaline conditions its 
hydrolysis may be considered as the following reaction: 

ClCH 2 COO“ + OH“ ^ HOCH 2 COO“ + cr. 

Data giving the concentration of ClCH 2 COO _ , c (in M), as a function of time, t (in s), 
are provided for this reaction carried out in excess alkalai at live different temperatures 
in the data files caa-T.txt (t = 4 0, 5 0, 6 0, 70, 8 0 in °C): these may be obtained 
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from 


scipython.com/ex/ade . The reaction is known to be second order and so obeys 


the integrated rate law 


c c o 

where k is the effective rate constant and co the initial (i = 0) concentration of 
chloroacetic acid. 

Use your Experiment class to interpret these data by linear regression of 1 /c against 
t, determining m{= k) for each temperature. Then, for each value of k, determine the 
activation energy of the reaction through a second linear regression of ln k against 1 /T 
in accordance with the Arrhenius law: 


k = Ae~ E *l RT 


ln/: = InA — 


£a 

RT ’ 


where R = 8.314 JK 1 mol 1 is the gas constant. Note: the temperature must be in 
Kelvin. 
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5 IPython and IPython Notebook 


The IPython shell and the related interactive, browser-based IPython Notebook provide 
a powerful interface to the Python language. IPython has several advantages over the 
native Python shell, including easy interaction with the operating system, introspection 
and tab completion. IPython Notebook increasingly is being adopted by scientists to 
share their data and the code they write to analyze it in a standardized manner that aids 
reproducibility and visualization. 


5.1 IPython 

5.1.1 Installing IPython 

Comprehensive details on installing IPython are available at the IPython website: see 
http://ipython.org/install.html, but a summary is provided here. 

IPython is included in the Continuum Anaconda and Enthought Canopy Python dis- 
tributions. To update to the current version within Anaconda, use the conda package 
manager: 

conda update conda 
conda update ipython 

With Canopy, use 

enpkg ipython 

If you are not using these distributions but already have Python installed, there are 
several alternative options. If you have the pip package manager: 

pip install ipython 

pip install "ipython[notebook]" 

It is also possible to manually download the latest IPython version from the github 
repository at https://github.com/ipython/ipython/releases and compile and install from 
its top-level source directory with 

python setup.py install 
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5.1.2 Using the IPython Shell 

To start an interactive IPython shell session from the command line, simply type 
ipython. You should be greeted with a message similar to this one: 

Python 3.3.5 |Anaconda 2.0.1 (x86_64)| (default, Mar 10 2014, 11:22:25) 

Type "Copyright", "credits" or "license" for more information. 

IPython 2.1.0 -- An enhanced Interactive Python. 

Anaconda is brought to you by Continuum Analytics. 

Please check out: http://continuum.io/thanks and https://binstar.org 
? -> Introduction and overview of IPython's features. 

%quickref -> Quick reference. 

help -> Python's own help system. 

object? -> Details about 'object', use 'object??' for extra details. 


In [1] : 

(The precise details of this message will depend on the setup of your system.) The 
prompt in [1] : is where you type your Python statements and replaces the native 
Python > > > shell prompt. The counter in square brackets increments with each Python 
statement or code block. For example, 


In [1] : 

4 + 5 

Out [1] : 

9 

In [2] : 

print(1) 

1 


In [3] : 

for i in range(4) 


print (i, end= 

0123 


In [4] : 



To exit the IPython shell, type quit or exit. Unlike with the native Python shell, no 

parentheses are required. 1 

Help commands 

As listed in the welcome message, there are various helpful commands to obtain infor¬ 
mation about using IPython: 

• Typing a single ‘?’ outputs an overview of the usage of IPython’s main features 
(page down with the space bar or f; page back up with b; exit the help page 
with q). 

• %quickref provides a brief reference summary of each of the main IPython 
commands and “magics” (see Section 5.1.3). 

• help () or help ( object) invokes Python’s own help system (interactively or 
for object if specified). 

• Typing one question mark after an object name provides information about that 
object: see below. 


1 Some find this alone a good reason to use IPython. 
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Possibly the most frequently used help functionality provided by IPython is the intro- 
spection provided by the obj ect? syntax. For example, 

In [4] : a = [5, 6] 

In [5] : a? 

Type: list 

String form: [5, 6] 

Length: 2 

Docstring: 

list() -> new empty list 

list(iterable) -> new list initialized from iterable's items 

Here, the command a? gives details about the object a: its string representation (which 
would be produced by, for example, print (a) ), its length (equivalent to len (a) and 
the docstring associated with the class of which it is an instance: since a is a list, this 
provides brief details of how to instantiate a list object. 2 

The ? syntax is particularly useful as a reminder of the arguments that a function or 
method takes. For example, 

In [6]: import pylab 
In [7]: pylab.linspace? 

String form: <function linspace at 0xl0432d560> 

File: /Users/christian/anaconda/envs/py33/lib/python3.3/site-packages/numpy/ 

core/function_base.py 

Definition: pylab.linspace(start, stop, num=50, endpoint=True, retstep=False) 

Docstring: 

Return evenly spaced numbers over a specified interval. 

Returns 'num' evenly spaced samples, calculated over the 
interval ['start', 'stop' ]. 

The endpoint of the interval can optionally be excluded. 

Parameters 


start : scalar 

The starting value of the sequence. 
stop : scalar 

The end value of the sequence, unless 'endpoint' is set to False. 

In that case, the sequence consists of all but the last of ''num + 1'' 
evenly spaced samples, so that 'stop' is excluded. Note that the step 
size changes when 'endpoint' is False, 
num : int, optional 

Number of samples to generate. Default is 50. 
endpoint : bool, optional 

If True, 'stop' is the last sample. Otherwise, it is not included. 
Default is True. 
retstep : bool, optional 

If True, return ('samples', 'step'), where 'step' is the spacing 
between samples. 


2 


This is what is meant by introspection: Python is able to inspect its own objects and provide information 
about them. 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:44, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.101 7/CB09781 1 39871 754.005 




5.1 IPython 


163 


Returns 


See Also 


Examples 


For some objects, the syntax object?? returns more advanced information such as 
the location and details of its source code. 

Tab completion 

Just as with many command line shells, IPython supports tab completion: start typing 
the name of an object or keyword, press the <TAB> key, and it will autocomplete it for 
you or provide a list of options if more than one possibility exists. For example, 

In [8]: w<TAB> 

%%writefile %who %who ls %whos while with 


In [8] : w 

If you resume typing until the word becomes unambiguous (e.g., add the letters hi) and 
then press <TAB> again: it will be autocompleted to while. The options with percent 
signs in front of them are “magic functions,” described in Section 5.1.3. 


History 

You may already have used the native Python shell’s command history functionality 
(pressing the up and down arrows through previous statements typed during your current 
session). IPython Stores both the commands you enter and the output they produce in 
the special variables in and Out (these are, in fact, a list and a dictionary respectively, 
and correspond to the prompts at the beginning of each input and output). For example, 


In 

In 


[9] : d = { ' C' : ' Cador' , 


[10] 


for a in 'ACGT': 
print (d [a] ) 


G' : 


'Galahad', 


'T': 'Tristan', 'A': 'Arthur'} 


o 

0 


Arthur 


Cador 


Galahad 


Tristan 


In [11]: 

d = {'C': 'Cytosine 

In [12] : 

In [10] 

Out [12] : 

"for a in 'ACGT':\n 

In [13]: 

exeo (In [10] ) 

Adenine 


Cytosine 


Guanine 


Thymine 



'G': 'Guanine', 'T': 'Thymine', 'A': 'Adenine'} 
print (d [a] ) \n " 
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O Note that in [10] simply holds the string version of the Python statement (here a 
for loop) that was entered at index 10. 

© To actually execute the statement (with the current dictionary d), we must send it to 
Python’s exec built-in (see also the %rerun magic, Section 5.1.3). 

There are a couple of further shortcuts: the alias _ iw is the same as in [N], N is 
the same as Out [N] , and the two most recent outputs are returned by the variables _ 
and_respectively. 

To view the contents of the history, use the %history or %hist magic function. By 
default only the entered statements are output; it is often more useful to output the line 
numbers as well, which is acheived using the -n option: 

In [14]: %history -n 
1: 4 + 5 

2: print(l) 

3 : 

for i in range(4): 
print(i) 

4: a = [5, 6] 

5: a? 

6: import pylab 
7: pylab.linspace? 


8 

d = { 'C' : 

'Cador', 'G' 

: 'Galahad', 'T' 

: 'Tristan', 'A' 

: 'Arthur'} 

10 






for a in 'ACGT' 





print (d [a] ) 





11 

d = { 'C' : 

'Cytosine', 

' G' : ' Guanine', 

' T' : 'Thymine', 

' A' : 'Adenine'} 

12 

In [10] 





13 

exec(In [10]) 




14 

%history 

-n 





To output a specilic line or range of lines, refer to them by number and/or number range 
when calling %history: 

In [15]: %history 4 
a = [5, 6] 

In [16]: %history -n 2-5 
2: print(1) 

3 : 

for i in range(4): 
print (i) 

4: a = [5, 6] 

5: a? 


In [17]: %history -n 1-3 7 12-14 


1 

4 + 5 

2 

print (1) 

3 

for 

in range(4): 

print (i) 

7 

pylab.linspace? 

12 

In [10] 

13 

exec (In[10]) 

14 

%history -n 
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This syntax is also used by several other IPython magic functions (see the following 
section). The %history function can also take an additional option: -o displays the 
output as well as the input. 

Pressing CTRL-R brings up a prompt, the somewhat cryptic (reverse-i- 
search) ' ' :, from which you can search within your command history. 3 


Interacting with the operating system 

IPython makes it easy to execute operating system commands from within your shell 
session: any statement preceded by an exclamation mark, !, is sent to the operating 
system command line (the “system shell”) instead of being executed as a Python state¬ 
ment. For example, you can delete files, list directory contents and even execute other 
programs and Scripts: 

In [11]: !pwd # return the current working directory 

/Users/christian/research 

In [12]: !Is # list the files in this directory 

Meetings Papers code books 

databases temp-file 

In [13]: !rm temp-file # delete temp-file 


In [14]: !Is 

Meetings Papers code books 

databases 


Note that, for technical reasons, 4 the cd (Unix-like Systems) and chdir (Windows) 
commands must be executed as IPython magic functions: 


In [15]: %cd / 

In [16]: !Is 

Applications 

bin 

opt 

private 
In [17]: %cd ~ 
In [18]: !Is 

output.txt 
zigzag.py 


# Change into root directory 


Volumes 

net 

WWW 

sbin 


usr 

NetWork 

System 

Users 


Library 

cores 

dev 

horne 


/temp # Change directory to temp within user's home directory 
test.py readme.txt utils 


If you use Windows and want to include a drive letter (such as c:) in the directory path 
you should enclose the path in quotes: %cd ' C : \My Documents'. 

Help, via ! command ?, and tab completion, as described in Section 5.1.2, work within 
operating system commands. 

You can pass the values of Python variables to operating system commands by pre- 
fixing the variable name with a dollar sign, $: 

In [19]: python_script = 'zigzag.py' 

In [20]: !ls $python_script 


3 This functionality may be familiar to users of the bash shell. 

4 System commands executed via the ! command method spawn their own shell, which is discarded 
immediately afterward; changing a directory occurs only in this spawned shell and is not reflected in the 
one running IPython. 
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zigzag.py 

In [21]: text_files = '*.txt' 

O In [22]: text_file_list = !ls $text_files 
In [23]: text_file_list 
output.txt readme.txt 

In [24]: readme_file = text_file_list[1] 

In [25]: Icat $readme_file 

This is the file readme.txt 

Each line of the file appears as an item 

in a list when returned from Icat readme.txt 

© In [26]: readme_lines = Icat $readme_file 

In [27]: readme_lines 
Out [2 8] : 

['This is the file readme.txt', 

'Each line of the file appears as an item', 

'in a list when returned from Icat readme.txt'] 

O Note that the output of a system command can be assigned to a Python variable, here 
a list of the . txt files in the current directory. 

© The cat system command returns the contents of the text file; IPython splits this 
output on the newline character and assigns the resulting list to readme lines. See 
also Section 5.1.3 


5.1.3 IPython magic functions 

IPython provides many “magic” functions (or simply magics, those commands prefixed 
with %) to speed up coding and experimenting within the IPython shell. Some of the 
more useful ones are described in this section; for more advanced information the reader 
is referred to the IPython documentation. 5 IPython makes a distinction between line 
magics : those whose arguments are given on a single line, and cell magics (prefixed by 
two percent signs, %%): those which act on a series of Python commands. An example 
is given in Section 5.1.3 where we describe the %%timeit cell magic. 

A list of currently available magic functions can be obtained by typing %lsmagic. 
The magic function %automagic toggles the “automagic” setting: its default is ON 
meaning that typing the name of a magic function without the % will also execute that 
function, unless you have bound the name as a Python identifier (variable name) to some 
object. The same principle applies to system commands: 

In [x]: ls 

output.txt test.py readme.txt utils 

zigzag.py 
In [x]: ls = 0 

In [x]: ls # Now ls is an integer; lis will stili work 

Out[x]: 0 

Table 5.1 summarizes some useful IPython magics; the following subsections explain 
more fully the less straightforward ones. 


5 http://ipython.org/documentation.html. 
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Table 5.1 Useful IPython line magics 


Magic 


Descriptiori 


%alias 

%alias_magic 

%bookmark 

%cd 

%dhist 

%edit 

%env 

%history 

%load 

%macro 

%paste 

%pylab 

%recall 

%rerun 

%reset 

%run 

%save 

%sx or ! ! 

%timeit 

%who 

%who_ls 

%whos 


Create an alias to a system command. 

Create an alias to an existing IPython magic. 

Interact with IPython’s directory bookmarking system. 

Change the current working directory. 

Output a list of visited directories. 

Create or edit Python code within a text editor and then execute it. 
List the system environment variables, such as $HOME. 

List the input history for this IPython session. 

Read in code frorn a provided hle and make it available for 
editing. 

Deline a named macro from previous input for future reexecution. 
Paste input from the clipboard: use this in preference to, for 
example, CTRL-V, to handle code indenting properly. 

Activate the pylab library within the current session for inter- 
active plotting. 

Place one or more input lines from the command history at the 
current input prompt. 

Reexecute previous input from the numbered command history. 
Reset the namespace for the current IPython session. 

Execute a named file as a Python script within the current session. 
Save a set of input lines or macro (defined with %macro) to a file 
with a given name. 

Shell execute: run a given shell command and store its output. 
Time the execution of a provided Python statement. 

Output ali the currently defined variables. 

As for %who, but return the variable names as a list of strings. 

As for %who, but provides more information about each variable. 


Aliases and bookmarks 

A system shell command can be given an alias: a shortcut for a shell command that 
can be called as its own magic. For example, on Unix-like Systems we could define the 
following alias to list only the directories on the current path: 

In [x]: %alias lstdir ls -d */ 

In [x]: %lstdir 

Meetings/ Papers/ code/ books/ 

databases/ 

Now typing %lstdir has the same effect as !ls -d */. If %automagic is ON this 
alias can also simply be called with lstdir. 

The magic %alias_magic provides a similar functionality for IPython magics. For 
example, if you want to use %h as an alias to %history, type: 

In [x]: %alias_magic h history 
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When working on larger projects it is often necessary to switch between different 
directories. IPython has a simple system for maintaining a list of bookmarks which act 
as shortcuts to different directories. The syntax for this magic function is 
%bookmark <name> [directory] 

If [directory] is omitted, it defaults to the current working directory. 

In [x]: %bookmark py ~/research/code/python 
In [x]: %bookmark www /srv/websites 
In [x]: %cd py 

/Users/christian/research/code/python 

It may happen that a directory with the same name as your bookmark is within the 
current working directory. In that case, this directory takes precedence and you must 
use %cd -b <name> to refer to the bookmark. 

A few more useful commands include: 

• %bookmark -1: list ali bookmarks 

• %bookmark -d <name>: remove bookmark <name> 

• %bookmark - r: remove ali bookmarks 

Timing code execution 

The IPython magic %timeit <statement> times the execution of the single-line 
statement <statement>. The statement is executed N times in a loop, and each loop 
is repeated R times. A is a suitable, usually large, number chosen by IPython to yield 
meaningful results and R is, by default, 3. The average time per loop for the best of the 
R repetitions is reported. For example, to profile the sorting of a randorn arrangement of 
the numbers 1-100: 

In [x]: import random 

In [x]: numbers = list(range(1,101)) 

In [x]: random.shuffle(numbers) 

In [x]: %timeit sorted(numbers) 

100000 loops, best of 3: 13.2 fis per loop 

Obviously the execution time will depend on the system (processor speed, memory, 
etc.). The aim of repeating the execution many times is to allow for variations in speed 
due to other processes running on the system. You can select N and R explicitly by 
passing values to the options -n and - r respectively: 

In [x]: %timeit -n 10000 -r 5 sorted(numbers) 

10000 loops, best of 5: 11.2 fis per loop 

The cell magic, %%timeit enables one to time a multiline block of code. For exam¬ 
ple, a naive algorithm to find the factors of an integer n can be examined with 

In [x] : n = 150 
In [x]: %%timeit 
factors = set() 
for i in range(1, n+1): 
if not n % i: 

factors.add(n // i) 


100000 loops, best of 3: 16.3 fis per loop 
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Recalling and rerunning code 

To reexecute one or more lines from your IPython history, use %rerun with a line 
number or range of line numbers: 


In [1]: import math 

In [2]: angles = [0, 30, 60, 90] 

In [3]: for angle in angles: 

sine_angle = math.sin(math.radians(angle)) 

print ('sin({:3d}) = {:8.5f}'.format(angle, sine_angle)) 


sin( 0) 
sin( 30) 
sin( 45) 
sin( 60) 
sin( 90) 


0 . 00000 
0.50000 
0.70711 
0.86603 
1 . 00000 


In [4]: angles = [15, 45, 75] 

In [5]: %rerun 3 
=== Executing: === 
for angle in angles: 

sine_angle = math.sin(math.radians(angle)) 

print ('sin({:3d}) = {:8.5f}'.format(angle, sine_angle)) 


=== Output: === 
sin( 15) = 0.25882 

sin( 45) = 0.70711 

sin( 75) = 0.96593 


In [6]: %rerun 2-3 

=== Executing: === 

angles = [0, 30, 45, 60, 90] 

for angle in angles: 

sine_angle = math.sin(math.radians(angle)) 

print (' sin({:3d}) = {:8.5f}'.format(angle, sine_angle)) 


== = Output: 
sin( 0) = 
sin( 30) = 
sin( 45) = 
sin( 60) = 
sin( 90) = 


0 . 00000 
0.50000 
0.70711 
0.86603 
1 . 00000 


The similar magic function %recall places the requested lines at the command 
prompt but does not execute them until you press Enter, allowing you to modify them 
first if you need to. 

If you find yourself reexecuting a series of statements frequently, you can define a 
named macro to invoke them. Specify line numbers as before: 

In [7]: %macro sines 3 

Macro 'sines' created. To execute, type its name (without quotes). 

=== Macro contents: === 
for angle in angles: 

sine_angle = math.sin(math.radians(angle)) 

print('sin({:3d}) = {:8.5f}'.format(angle, sine_angle)) 
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In [8]: angles = [-45, -30, 0, 30, 45] 

In [9]: sines 

sin(-45) = -0.70711 

sin(-30) = -0.50000 

sin( 0) = 0.00000 

sin( 30) = 0.50000 

sin( 45) = 0.70711 


Loading, executing and saving code 

To load code from an extemal file into the current IPython session, use 

%load <filename> 

If you want only certain lines from the input file, specify them after the -r option. 
This magic enters the lines at the command prompt, so they can be edited before being 
executed. 

To load and execute code from a file, use 

%run <filename> 

Pass any command line options after filename\ by default IPython treats them the same 
way that the system shell would. There are a few additional options to %run: 

• - i: Run the script in the current IPython namespace instead of an empty one (i.e., 
the program will have access to variables defined in the current IPython session); 

• -e: Ignore sys . exit () calls and SystemExit exceptions; 

• -t: Output timing information at the end of execution (pass an integer to the 
additional option -N to repeat execution that number of times). 

For example, to run my script .py 10 times from within IPython with timing infor¬ 
mation: 

In [x]: %run -t -N10 my_script.py 

To save a range of input lines or a macro to a file, use %save. Line numbers are 
specified using the same syntax as %history. A .py extension is added if you don’t 
add it yourself, and confirmation is sought before overwriting an existing file. For 
example, 

In [x]: %save sinesl 183 

The following commands were written to file 'sinesl.py': 
import math 

angles = [-45, -30, 0, 30, 45] 
for angle in angles: 

print( ' sin({:3d}) = {: 8.5f }' . format(angle, math.sin(math.radians(angle)))) 

In [x]: %save sines2 1-3 

The following commands were written to file 'sines2.py': 
import math 

angles = [0, 30, 60, 90] 
for angle in angles: 

print (' sin({:3d}) = {:8.5f }' .format(angle, math.sin(math.radians(angle)))) 

Finally, to append to a file instead of overwriting it, use the - a option: 

%save -a <filename> <line numbers> 
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Capturing the output of a shell command 

The IPython magic %sx command, equivalent to ! ! command executes the shell com¬ 
mand command and returns the resulting output as a list (split into semantically useful 
parts on the new line character so there is one item per line). This list can be assigned 
to a variable to be manipulated later. For example, 


In [x]: current_working_directory = %sx pwd 
In [x]: current_working_directory 
['/Users/christian/temp'] 

In [x]: filenames = %sx ls 
In [x]: filenames 
Out [x] : 

['output.txt 7 , 

7 test.py', 

7 readme.txt 7 , 

'utils', 

7 zigzag.py'] 


Here, filenames is a list of individual filenames. 

The returned object is actually an IPython. utils . text. SList string list object. 
Among the useful additional features provided by SList are a native method for split- 
ting each string into fields delimited by whitespace: f ields; for sorting on those fields: 
sort; and for searching within the string list: grep. For example, 


In [x]: files = %sx ls -1 
In [x]: files 
[ 7 total 8 7 , 


-rw-r--r-- 

1 

christian 

staf f 

93 

-rw-r--r-- 

1 

christian 

staf f 

23258 

-rw-r--r- - 

1 

christian 

staf f 

218 

drwxr-xr-x 

2 

christian 

staf f 

68 

-rw-r--r-- 

1 

christian 

staf f 

365 


In [x] : dei files [0] # strip non-file 

In [x]: files.fields() 

Out [x] : 


5 Nov 16:30 output.txt' , 
5 Nov 16:31 readme.txt' , 
5 Nov 16:32 test.py 7 , 

5 Nov 16:32 utils', 

5 Nov 16:20 zigzag.py 7 ] 
line 'total 8' 


[ [ 7 -rw-r- 

-r-- 7 , 

7 1' , 

7 christian 7 , 

7 staff 7 , 

7 93 ' , ' 

5 7 , 'Nov 7 , '16:30', ' 

1 output.txt'], 

[ 7 -rw-r- 

-r-- 7 , 

7 1' , 

7 christian 7 , 

7 staff 7 , 

7 23258 7 

’, 7 5 7 , 'Nov 7 , '16:31' 

1 , 'readme.txt 

[ 7 -rw-r- 

-r-- 7 , 

7 1' , 

7 christian 7 , 

7 staff 7 , 

7 365', 

7 5 7 , 'Nov 7 , '16:20', 

'zigzag.py']] 


In [x] : ['{} last modified at {} on {} {} 7 .format(f[8] , f [7] , f [5], f [6]) 

for f in files.fields()] 

Out [x] : 

['output.txt last modified at 16:30 on 5 Nov', 

'readme.txt last modified at 16:31 on 5 Nov 7 , 

'test.py last modified at 16:32 on 5 Nov 7 , 

'utils last modified at 16:32 on 5 Nov 7 , 

'zigzag.py last modified at 16:20 on 5 Nov 7 ] 


The fields method can also take arguments specifying the indexes of the fields to 
output; if more than one index is given the fields are joined by spaces: 

In [x]: files.fields(0) # First field in each line of files 

Out[x] : [ 7 -rw-r--r-- 7 , 7 -rw-r--r-- 7 , 7 -rw-r--r-- 7 , 7 drwxr-xr-x', 7 -rw-r--r-- 7 ] 
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In [x]: files.fields(-1) # Last field in each line of files 

Out[x] : ['output.txt', 'readme.txt', 'test.py', 'utils', 'zigzag.py'] 

In [x]: files.fields(8,7,5,6) 

Out[x]: 

['output.txt 16:30 5 Nov', 

'readme.txt 16:31 5 Nov', 

'test.py 16:32 5 Nov', 

'utils 16:32 5 Nov', 

'zigzag.py 16:20 5 Nov'] 

The sort method provided by SList objects can sort by a given field, optionally 
converting the field from a string to a number if required (so that, for example, l o > 9). 
Note that this method retums a new SList object. 


In [x]: files. 

.sort (4) 

# Sort 

alphanumerically by size (not usefui) 

Out [x] : 

['-rw-r--r-- 

1 

christian 

staff 

218 

5 

Nov 

16 : 32 

test.py', 

'-rw-r--r-- 

1 

christian 

staff 

23258 

5 

Nov 

16 : 31 

readme.txt', 

'-rw-r--r-- 

1 

christian 

staff 

365 

5 

Nov 

16:20 

zigzag.py', 

'drwxr-xr-x 

2 

christian 

staff 

68 

5 

Nov 

16 : 32 

utils', 

'-rw-r--r-- 

1 

christian 

staff 

93 

5 

Nov 

16 : 30 

output.txt'] 

In [x]: files. 

.sort (4, nums: 

=True) 

# 

Sort numerically by size (usefui) 

Out [x] : 

['drwxr-xr-x 

2 

christian 

staff 

68 

5 

Nov 

16 : 32 

utils', 

'-rw-r--r-- 

1 

christian 

staff 

93 

5 

Nov 

16 : 30 

output.txt', 

'-rw-r--r-- 

1 

christian 

staff 

218 

5 

Nov 

16 : 32 

test.py', 

'-rw-r--r-- 

1 

christian 

staff 

365 

5 

Nov 

16:20 

zigzag.py', 

'-rw-r--r-- 

1 

christian 

staff 

23258 

5 

Nov 

16 : 31 

readme.txt' ] 


The grep method returns items from the SList containing a given string; 6 to search 
for a string in a given field only, use the field argument: 

In [x]: files.grep('txt') # Search for lines containing 'txt' 

Out[x]: 

['-rw-r--r-- 1 christian staff 93 5 Nov 16:30 output.txt', 

'-rw-r--r-- 1 christian staff 23258 5 Nov 16:31 readme.txt'] 

In [x]: files.grep('16:32', field=7) # Search file files created at 16:32 
Out[x]: 

['-rw-r--r-- 1 christian staff 218 5 Nov 16:32 test.py', 

'drwxr-xr-x 2 christian staff 68 5 Nov 16:32 utils'] 


Example E5.1 RNA encodes the amino acids of a peptide as a sequence of codons, 
with each codon consisting of three nucleotides chosen from the ‘alphabet’: U (uracil), 
C (cytosine), A (adenine) and G (guanine). 

The Python script, codon_lookup.py, available at scipython.com/eg/aab , creates 


a dictionary, codon_table, mapping codons to a mi no acids where each a mi no acid is 
identified by its one-letter abbreviation (e.g., R = arginine). The stop codons, signaling 
termination of RNA translation, are identified with the single asterisk character, *. 


6 


In fact, its name implies it will match regukir expressioris as well, but we will not expand on this here. 
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The codon aug signals the start of translation within a nucleotide sequence as well 
as coding for the amino acid methionine. 

This script can be executed within IPython with %run codon lookup.py (or 
loaded and then executed with %load codon lookup. py followed by pressing Enter: 


In [x] : 

%run codon lookup.py 

In [x] : 

codon table 

Out [x] : 


{'GCG': 

'A' , 

'UAA': 

' ie 1 / 

'GGU': 

' G' , 

'UCU': 

' S' , 

'ACA': 

' T' , 

'ACC': 

' T' } 


Let’s detine a function to translate an RNA sequence. Type %edit and enter the 
following code in the editor that appears. 

def transiate_rna(seq): 

start = seq.find('AUG') 
peptide = [] 
i = start 

while i < len(seq)-2: 
codon = seq[i:i+3] 
a = codon_table[codon] 
if a == '*': 

break 
i += 3 

peptide.append(a) 
return ' ' .join(peptide) 

When you exit the editor it will be executed, defining the function, translate_rna: 

IPython will make a temporary file named: /var/folders/fj/yv29fhm91v7_6g 
7sqsylz2940000gp/T/ipython_edit_thunq9/ipython_edit_dltv_i.py 
Editing... done. Executing edited code... 

Out[x]: "def translate_rna(seq):\n start = seq.find('AUG')\n 
peptide = [] \ 

n i = start\n while i < len(seq)-2:\n codon = seq[i:i+3]\n a 

= codon_table[codon]\n if a == '*':\n break\n i += 3\n 

peptide.append(a)\n return join(peptide)\n" 

Now feed the function an RNA sequence to translate: 

In[x]: seq = 'CAGCAGCUCAUACAGCAGGUAAUGUCUGGUCUCGUCCCCGGAUGUCGCUACCCACGAG 
ACCCGUAUCCUACUUUCUGGGGAGCCUUUACACGGCGGUCCACGUUUUUCGCUACCGUCGUUUUCCCGGUGC 
CAUAGAUGAAUGUU' 

In [x]: translate_rna(seq) 

Out [x] : ' MSGLVPGCRYPRDPYPTFWGAFTRRSTFFATWFPVP' 

To read in a list of RNA sequences (one per line) from a text file, seqs. txt, and 
translate them, one could use %sx with the system command cat (or, on Windows, the 
command type): 

In [x]: seqs = %sx cat seqs.txt 
In [x]: for seq in seqs: 
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...: print (translate rna(seq)) 

MHMLDENLYDLGMKACHEGTNVLDKWRNMARVCSCDYQFK 

MQGSDGQQESYCTLPFEVSGMP 

MPVEWRTMQFQRLERASCVKDSTFKNTGSFIKDRKVSGISQDEWAYAMSHQMQPAAHYA 

MIWTMCQ 

MGQCMRFAPGMHGMYS S FHPQHKEITPGIDYASMNEVETAETIRPI 

5.1.4 

Exercises 

Problems 

P5.1.1 Improve on the algorithm to find the number of factors of an integer given in 
Section 5.1.3 by (a) looping the trial factor, i, up to no greater than the square root 
of n (why is it not necessary to test values of i greater than this?), and (b) using a 
generator (see Section 4.3.5). Compare the execution speed of these alternatives using 
the %timeit IPython magic. 

P5.1.2 Using the fastest algorithm from the previous question, devise a short piece 
of code to determine the highly composite number s less than 100000 and use the 
%%timeit cell magic to time its execution. A highly composite number is a positive 
integer with more factors than any smaller positive integer, for example: 1,2,4,6,12,24, 
36,48,- 

5.2 

IPython Notebook 

IPython Notebook provides an interactive environment for Python programming within 
a web browser. Its main advantage over the more traditional console-based approach of 
the IPython shell is that Python code can be combined with documentation (including 
in rendered LaTeX), images and even rich media such as embedded videos. IPython 
notebooks are increasingly being used by scientists to communicate their research by 
including the computations carried out on data as well as simply the results of those 
computations. The format makes it easy for researchers to collaborate on a project 
and for others to validate their lindings by reproducing their calculations on the same 
data. Note that from version 4, the IPython Notebook project has been reformulated as 
Jupyter with bindings for other languages as well as Python. 

5.2.1 

IPython notebook basies 

Starting the IPython notebook server 

If you have IPython notebook installed, the server that runs the browser-based interface 
to IPython can be started from the command line with 

ipython notebook 
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IP[y]: Notebook 


Notebooks Running Clusters 

To import a notebook, drag the file onto the listing below or click here. New Notebook 


Figure 5.1 The IPython notebook index page. 


This will open a web browser window at the URL of the local IPython notebook 
application. By default this is http://127.0- 0.1:8888 though it will default to a different 
port if 8888 is in use. 

The notebook index page (Figure 5.1) contains a list of the notebooks currently 
available in the directory from which the notebook server was started. This is also the 
default directory to which notebooks will be saved (with the extension . ipynb), so it 
is a good idea to execute the above command somewhere convenient in your directory 
hierarchy for the project you are working on. 

The index page contains three tabs: Notebooks lists the IPython notebooks and sub- 
directories within the current working directory, Running lists those notebooks that are 
currently active within your session (even if they are not open in a browser window); 
Clusters provides an interface to IPython’s parallel computing engine: we will not cover 
this topic in this book. 

From the index page, one can start a new notebook (by clicking on “New Notebook”) 
or open an existing notebook (by clicking on its name). To import an existing notebook 
into the index page, either click where indicated at the top of the page or drag the 
notebook file into the index listing from elsewhere on your operating system. 

To stop the notebook server, press CTRL-C in the terminal window it was started 
from (and confirm at the prompt). 


Editing an IPython notebook 

To start a new notebook, click the “New Notebook” button. This opens a new browser 
tab containing the interface where you will write your code and connects it to an IPython 
kernel, the process responsible for executing the code and communicating the results 
back to the browser. 

The new notebook document (Figure 5.2) consists of a title bar, a menu bar and a 
tool bar, under which is an IPython prompt where you will type the code and markup 
(e.g., explanatory text and documentation) as a series of cells. 

In the title bar the name of the first notebook you open will probably be “UntitledO”; 
click on it to rename it to something more informative. The menu bar contains options 
for saving, copying, printing, rearranging and otherwise manipulating the notebook 
document. The tool bar consists of series of icons that act as shortcuts for common 
operations that can also be achieved through the menu bar. 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:44, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.1017/CB09781139871754.005 






176 


IPython and IPython Notebook 


IP[y]: Notebook UntitledO (autosaved) 

File Edit View Insert Cell Kernel Help S O 

El O ♦ + ►■C Code J Ceil Toolbar: None > 


In [ ]: | 


Figure 5.2 IPython with a new notebook document. 

There are four types of input cells where you can write the content for your notebook: 

• Code cells: the default type of cell, this type of cell consists of executable code. 
As far as this chapter is concerned, the code you write here will be Python, but 
IPython Notebook (now called Jupyter) does provide a mechanism of executing 
code written in other languages such as Julia and R. 

• Heading cells: six levels of heading (from top-level section tities to paragraph- 
level text). When “executed” this type of cell produces a rich-text rendering of 
their contents at an appropriate font size. 

• Markdown cells: this type of cell allows for a rich form of documentation for your 
code. When executed, the input to a markdown cell is converted into HTML, 
which can include mathematical equations, font effects, lists, tables, embedded 
images and videos - see Section 5.2.1. 

• Raw cells: input into this type of cell is not changed by the notebook - its content 
and formatting is preserved exactly. 

Running cells 

Each cell can consist of more than one line of input, and the cell is not interpreted until 
you “run” (i.e., execute) it. This is achieved either by selecting the appropriate option 
from the menu bar (under the “Cell” drop-down submenu), by clicking the “Run cell” 
“play” button on the tool bar, or through the following keyboard shortcuts: 

• Shift-Enter: Execute the cell, showing any output, and then move the cursor 
onto the cell below. If there is no cell below, a new, empty one will be created. 

• CTRL-Enter: Execute the cell in place, but keep the cursor in the current cell. 
Useful for quick “disposable” commands to check if a command works or for 
retrieving a directory listing. 

• Alt-Enter: Execute the cell, showing any output, and then insert and move the 
cursor to a new cell immediately beneath it. 

Two other keyboard shortcuts are useful. When editing a cell the arrow keys navigate 
the contents of the cell {edit mode)', from this mode, pressing Esc enters command mode 
from which the arrow keys navigate through the cells. To reenter edit mode on a selected 
cell, press Enter. 

The menu bar, under the “Cell” drop-down submenu, provides many ways of running 
a notebook’s cells: usually, you will want to run the current cell individually or run it 
and ali those below it. 
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Code cells 

You can enter anything into a code cell that you can when writing a Python program 
in an editor or at the regular IPython shell. Code in a given cell has access to objects 
defined in other cells (providing they have been run). For example, 


In [ ] : 



Pressing Shift-Enter or clicking Run Cell executes this statement (defining n but 
producing no output) and opens a new cell undemeath the old one: 


In [1] : 


In [ ] : 


n = 10 


Entering the following statements at this new prompt: 

In [ ] : 


sum_of_squares = n * (n+1) * (2*n+l) // 6 
print('l**2 + 2**2 + ... + {}**2 = {}'.format(n, 
sum_of_squares)) 


and executing as before produces output and opens a third empty input cell. The 
whole notebook document then looks like: 


In [1] : 

In [2] : 


n = 10 


sum_of_squares = n * (n+1) * (2*n+l) // 6 
print('l**2 + 2**2 + ... + {}**2 = {}'.format(n, 
sum_of_squares)) 


Out [2] : 1**2 + 2**2 + ... + 10**2 = 385 


In [ ] : 


You can edit the value of n in input cell 1 and rerun the entire document to update 
the output. It is worth noting that it is also possible to set a new value for n after the 
calculation in cell 2: 


In 


[3] : 



running cell 3 and then cell 2 then leaves the output to cell 2 as 
Out [2] : 1**2 + 2**2 + ... + 15**2 = 1240 


even though the cell above stili defines n to be 10. That is, unless you run the entire 
document from the beginning, the output does not necessarily reflect the output of a 
script corresponding to the code cells taken in order. 

System commands (those prefixed with ! or !!) and IPython magics can ali be used 
within IPython notebook. 

It is also possible to use pylab “inline” in the notebook so that plots show up as 
images embedded in the document. To tum this feature on, use 


In [x] 


%pylab inline 

\ _ > 
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By itself this command imports the pylab library we used in Chapter 3 (and a few 
other things besides), but imports its symbols into the namespace of your interactive 
session. That is, the %pylab inline magic has the effect of from pylab import 
* and you can type, for example, plot (x, y) instead of pylab.plot (x, y) . To 
prevent this behavior, we recommend adding the argument --no-import-all: 


In [x] 


%pylab inline --no-import-all 


This stops pylab from polluting your namespace with its own definitions. 7 


Markdown cells 

Markdown cells convert your input text into HTML, applying styles according to a 
simple syntax illustrated below. The full documentation is at 
http://daringfireball.net/projects/markdown/ 

Here we explain the most useful features. A complete notebook of these examples can 
be downloaded from 


scipy thon. com/book/markdown 


Basic markdown 

• Simple styles can be applied by enclosing text by asterisks or underscores: 


In [x] : 

f ' 

Surrounding text by two asterisks denotes 

**bold style**; using one asterisk denotes 
*italic text*, as does _a single 
underscore_. 

Surrounding text by two asterisks denotes bold style; using 
one asterisk denotes italic text, as does a single underscore. 
Block quotes are indicated by a single angle bracket, >: 

In [x] : 

> "Climb if you will, but remember that 


courage and strength are nought without 
prudence, and that a momentary negligence 
may destroy the happiness of a lifetime. 

Do nothing in haste; look well to each 
step; and from the beginning think what 
may be the end." - Edward Whymper 


I “Climb if you will, but remember that courage and strength are nought 
without prudence, and that a momentary negligence may destroy 
the happiness of a lifetime. Do nothing in haste; look well to each 
step; and from the beginning think what may be the end.” - Edward 
Whymper 

• Code examples (for illustration rather than execution) are between blank lines and 

indented by four spaces (or a tab). The following will appear in a monospaced 
font with the charaeters as entered: 


7 It is particularly annoying to find your innocent variable names such as f clash with pylab’s own function 
calls. 
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In [x] 


n = 57 

while n != 1: 
i f n % 2 : 

n = 3*n + 1 
else : 

n //= 2 


n = 57 

while n != 1: 
if n % 2: 

n = 3*n + 1 
else: 

n //= 2 

• Inline code examples are created by surrounding the text with backticks ('): 


In [x] : 

Here are some Python keywords: 'for', 

'while' 



and 'lambda'. 




Here are some Python keywords: for, while and lambda. 


• New paragraphs are started after a blank line. 


HTML within markdown 


The markdown used by IPython notebooks encompasses HTML, so valid HTML enti- 
ties and tags can be used directly: for example, the <em> tag for emphasis, as can CSS 
styles to produce effects such as underlined text. Even complex HTML such as tables 
can be marked up directly. 


In [x] 


The following <em>Punnett table</em> is <span 
style="text-decoration: underline" >marked 
up</span> in HTML. 

<table style="text-align: center;"> 

<tr> 

<th style="border-top:none; border-left:none;" 

rowspan="2" coispan="2"></th> 

<th colspan="2">Male</th> 

</tr> 

<tr> 

<th>A</th> 

<th>a</th> 

</tr> 

<tr> 

<th rowspan="2">Female</th> 

<th>a</th> 

<td style="background: #aaa;">Aa</td> 
<td>aa</td> 

</tr> 

<tr> 

<th>a</th> 

<td style="background: #aaa;">Aa</td> 
<td>aa</td> 

</tr> 

</table> 
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The following Punnett table is marked up in HTML. 



Male 

A 

a 

Female 

a 

Aa 

aa 

a 

Aa 

aa 


Lists 


Itemized (unnumbered) lists are created using any of the markers *, + or -, and nested 
sublists are simply indented. 


In [x] 


The inner planets and their satellites: 

* Mercury 

* Venus 

* Earth 

* The Moon 
+ Mars 

- Phoebus 

- Deimos 

The inner planets and their satellites: 


Mercury 

Venus 


Earth 


The Moon 


Mars 


Phoebus 
- Deimos 

Ordered (numbered) lists are created by preceding items by a number followed by a 
full stop (period) and a space: 


1 . 

Symphony No. 

1 

in 

C major, Op. 

21 

2 . 

Symphony No. 

2 

in 

D major, Op. 

36 

3 . 

Symphony No. 

3 

in 

E-flat major 

("Eroica"), Op. 55 

_ , 


1. Symphony No. 1 in C major, Op. 21 

2. Symphony No. 2 in D major, Op. 36 

3. Symphony No. 3 in E-flat major ("Eroica"), Op. 55 

Links 

There are three ways of introducing links into markdown text: 
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• Inline links provide a URL in round brackets after the text to be tumed into a link 
in square brackets. For example, 


In [x] : 

' ' 

Here is a link to the 

[IPython website] (http://ipython.org/) . 


Here is a link to the IPython website. 

Reference links label the text to turn into a link by placing a name (containing 
letters, numbers or spaces) in square brackets after it. This name is expected to 
be defined using the syntax [name] : uri elsewhere in the document, as in the 
following example markdown cell. 

In [x] : 

Some important mathematical sequences are the 
[prime numbers][primes], 

[Fibonacci sequence] [fib] and the [Catalan 
numbers] [catalan numbers]. 


[primes]: http://oeis.org/A000040 
[fib]: http://oeis.org/A000045 
[catalan_numbers]: http://oeis.org/A000108] 


Some important mathematical sequences are the primes, 
Fibonacci sequence and the Catalan numbers. 

Automatic links, for which the clickable text is the same as the URL are created 
simply by surrounding the URL by angle brackets: 

In [x] : 

My website is <http://www.christianhill.co.uk>. 


My website is http://www.christianhill.co.uk. 

If the link is to a file on your local system, give as the URL the path, relative to the 
notebook directory, prefixed with files/: 


In [x] 


Here is [a local data file](files/data/dataO.txt). 

9 _i 


Here is a a local data file . 

Note that links open in a new browser tab when clicked. 


Mcithemcitics 

Mathematical equations can be written in UTgX and are rendered using the JavaScript 
library, MathJax. Inline equations are delimited by single dollar signs; “displayed” 
equations by doubled dollar signs: 


In [x] 


An inline equation appears within a sentence of 
text, as in the definition of the function 
$f (x) = \sin(x*2) $; displayed equations get 
their own line(s) between lines of text: 
$$\int_(U\infty e A {-x A 2}dx = \frac{ \sqrt { \pi}}{2}.$$ 
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An inline equation appears within a sentence of text, as in the 
definition of the function/(x) = sin(x 2 ); displayed equations get 
their own line(s) between lines of text: 


f 


dx = 


Images and video 

Links to image files work in exactly the same way as ordinary links (and can be inline or 
reference links), but are preceded by an exclamation mark, !. The text in square brackets 
between the exclamation mark and the link acts as ait text to the image. For example, 


In [x] 


! [An interesting plot of the Newton 

fractal](/files/images/newton_fractal.png) 

![A remote link to a star 

image](http://christianhill. co.uk/media/books/ 
python/star.svg) 


Video links must use the HTML5 <video> tag, but note that not all browsers support 
all video formats. For example, 


cvideo Controls style="width: 500px; 

margin: 0 

' 

auto; display: block;" 



src="files/diffmap-animated.ogv" 

/> 

_ > 


The data constituting images, video and other locally linked content are not embedded 
in the notebook document itself: these files must be provided with the notebook when it 
is distributed. 


5.2.2 Converting notebooks to other formats 

nbconvert is a tool, installed with IPython notebook, to convert notebooks from their 
native . ipynb format 8 to any of several altemative formats. It is run from the (system) 
command line as 

ipython nbconvert --to <format> <notebook.ipynb> 

where notebook. ipynb is the name of the IPython notebook file to be converted and 
format is the desired output format. The default (if no format is given), is to produce 
a static HTML file, as described below. 

Conversion to HTML 

The command 

ipython nbconvert cnotebook.ipynb> 

converts notebook. ipynb to HTML and produces a file, notebook. html in the cur¬ 
rent directory. This file contains all the necessary headers for a stand-alone HTML page, 


This format is, in fact, just a JSON (JavaScript Object Notation) document. 
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which will closely resemble the interactive view produced by the IPython notebook 
server, but as a static document. 

If you want just the HTML corresponding to the notebook without the header 
(<html >, <head>, <body> tags, etc.), suitable for embedding in an existing web page, 
add the --template basic option. 

Any supporting files, such as images, are automatically placed in a directory with 
the same base name as the notebook but suffixes with files. For example, ipython 
nbconvert mynotebook. ipynb generates mynotebook.html and the directory 
mynotebook_files. 

Conversion to LaTeX 

To export the notebook as a LaTeX document, use 

ipython nbconvert --to latex <notebook.ipynb> 

To automatically run pdf latex on the notebook. tex file generated to produce a PDF 
file, add the option —post pdf. 

Conversion to markdown 

ipython nbconvert --to markdown cnotebook.ipynb> 

converts the whole notebook into markdown (see Section 5.2.1): cells that are already 
in markdown are unaffected and code cells are placed in triple-backtick (' ' ') blocks. 

Conversion to Python 

The command 

ipython nbconvert --to python cnotebook.ipynb> 

converts notebook. ipynb into an executable Python script. If any of the notebook’s 
code cells contain IPython magic functions, this script may only be executable from 
within an IPython session. Markdown and other text cells are converted to comments in 
the generated Python script code. 
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NumPy has become the de facto Standard package for general scientific programming 
in Python. Its core object is the ndarray, a multidimensional array of a single data 
type which can be sorted, reshaped, subject to mathematical operations and statistical 
analysis, written to and read from files, and much more. The NumPy implementations of 
these mathematical operations and algorithms have two main advantages over the “core” 
Python objects we have used until now. First, they are implemented as precompiled C 
code and so approach the speed of execution of a program written in C itself; second, 
NumPy supports vectorization: a single operation can be carried out on an entire array, 
rather than requiring an explicit loop over the array’s elements. For example, compare 
the multiplication of two one-dimensional lists of n numbers, a and b, in the core python 
language: 

C = [] 

for i in range(n): 

c.append(a[i] * b[i] ) 

and using NumPy arrays: 1 

c = a * b 

The elementwise multiplication is handled by optimized, precompiled C and so is very 
fast (much faster for large n than the core Python altemative). The absence of explicit 
looping and indexing makes the code cleaner, less error-prone and closer to the Standard 
mathematical notation it reflects. 

All of NumPy’s functionality is provided by the numpy package. To use it, it is 
strongly advised to import with 

import numpy as np 

and then to refer to its attributes with the prefix np . (e.g., np. array). This is the way 
we use NumPy in this book. 


6.1 Basic array methods 

The NumPy array class is ndarray, which consists of a multidimensional table of 
elements indexed by a tuple of integers. Unlike Python lists and tuples, the elements 


1 We will use the ternis NumPy array and ndarray interchangeably 
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cannot be of different types: each element in a NumPy array has the same type, which 
is specified by an associated data type object (dtype). The dtype of an array specifies 
not only the broad class of element (integer, floating point number, etc.) but also how it 
is represented in memory (e.g., how many bits it occupies) - see Section 6.1.2. 

The dimensions of a NumPy array are called axes; the number of axes an array has is 
called its rank. 


6.1.1 Creating an array 

Basic array creation 

The simplest way to create a small NumPy array is to cati the np. array constructor 
with a list or tuple of values: 

In [x]: import numpy as np 

In [x] : a = np.array( (100, 101, 102, 103) ) 

In [x] : a 

Out [x] : array([100, 101, 102, 103]) 

In [x] : b = np. array ( [ [1., 2 . ] , [3.,4.]] ) 

Out [x] : 

array((( 1., 2.], 

[ 3., 4.]]) 

Note that passing a list of lists creates a two-dimensional array (and similarly for higher 
dimensions). 

Indexing a multidimensional NumPy array is a little different from indexing a con- 
ventional Python list of lists: instead of b [i] [ j ] , refer to the index of the required 
element as a tuple of integers, b [i, j ]: 


In [x] : 

b [0 

1] 

# same 

as b [ (0,1)] 

Out (x] : 

2.0 




In [x] : 

b [1 

1] = 0. 

# also 

for assignment 

Out [x] : 





array([ 

[ 1. 

2.] , 




[ 3. 

0.] ] ) 




The data type is deduced from the type of the elements in the sequence and “upcast” 
to the most generat type if they are of mixed but compatible types: 

In [x] : np.array( [-1, 0, 2.]) # mixture of int and float: upcast to float 

Out[x]: array([-1., 0., 2.]) 

You can also explicitly set the data type using the optional dtype argument (see Section 
6 . 1 . 2 ): 


In [x]: np.array( [0, 4, -4], dtype=complex) 

In [x]: array([ O.+O.j, 4.+0.j, -4.+0.j]) 

If your array is large or you do not know the element values at the time of creation, 
there are severat methods to declare an array of a particular shape filled with default 
or arbitrary values. The simplest and fastest, np. empty, takes a tuple of the array’s 
shape and creates the array without initializing its elements: the initiat element values 
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are undefined (typically random junk defined from whatever were the contents of the 
memory that Python allocated for the array). 

In [x] : np. empty ( (2,2) ) 

Out [x] : 

array([[ -2.31584178e+077, -1.72723381e-077], 

[ 2.15686807e-314, 2.78134366e-309]]) 

There are also helper methods np. zeros and np. ones, which create an array of the 
specified shape with elements prefilled with 0 and 1 respectively. np. empty, np. zeros 
and np. ones also take the optional dtype argument. 

In [x]: np.zeros((3,2)) # default dtype is 'float' 

Out[x]: 


array([ 

:t 0 ., 

0 .], 


[ 0 ., 

0 .], 


[ 0 ., 

0 .] ] 

In [x] : 

Out[x]: 

np.ones((3 

array([ 

:ti, 1 , 

1 ], 


[ 1 , 1 , 

1 ], 


[i, 1 , 

1 ] i ) 


If you already have an array and would like to create another with the same shape, 
np. empty_like, np. zeros_like and np. ones like will do that for you: 

In [x] : a 

Out[x]: array([100, 101, 102, 103]) 

In [x]: np.ones_like(a) 

Out[x]: array([1, 1, 1, 1]) 

In [x]: np.zeros_like(a, dtype=float) 

Out[x]: array([ 0., 0., 0., 0.]) 

Note that the array created inherits its dtype from the original array; to set its data 
type to something else, use the dtype argument. 

Initializing an array from a sequence 

To create an array containing a sequence of numbers there are two methods: 
np. arange and np . linspace. np. arange is the NumPy equivalent of range, except 
that it can generate floating point sequences. It also actually allocates the memory for 
the elements in an ndarray instead of retuming a generator-like object - compare 
Section 2.4.3. 

In [x]: np.arange(7) 

Out[x]: array([0, 1, 2, 3, 4, 5, 6]) 

In [x]: np.arange(1.5, 3., 0.5) 

Out[x]: array([1.5, 2. , 2.5])) 

As with range the array generated in these examples does not include the last 
elements, 7 and 3. However, arange has a problem: because of the finite precision 
of floating point arithmetic it is not always possible to know how many elements will 
be created. For this reason, and because one often often wants the last element of a 
specifed sequence, the np. linspace function can be a more useful way of creating 
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an sequence. 2 For example, to generate an evenly spaced array of the five numbers 
between 1 and 20 inclusive: 

In [x]: np.linspace(1, 20, 5) 

Out[x]: array([ 1. , 5.75, 10.5 , 15.25, 20. ]) 

np. linspace has a couple of optional boolean arguments. First, setting retstep to 
True returns the number spacing (step size): 

In [x]: x, dx = np.linspace(0., 2*np.pi, 100, retstep=True) 

In [x] : dx 

Out[x]: 0.06346651825433926 

This saves you from calculating dx = (end-start) / (num-1) separately; in this 
example, the 100 points between 0 and 2n inclusive are spaced by 2n/99 = 
0.0634665 . Finally, setting endpoint to False omits the final point in the 
sequence, as for np. arange: 

In [x]: x = np.linspace(0, 5, 5, endpoint=False) 

Out[x]: array([0., 1., 2., 3., 4.]) 

Note that the array generated by np. linspace has the dtype of floating point 
numbers, even if the sequence generates integers. 


Initializing an array from a function 

To create an array initialized with values calculated using a function, use NumPy’s 
np. f romfunction method, which takes as its arguments a function and a tuple repre- 
senting the shape of the desired array. The function should itself take the same number 
of arguments as dimensions in the array: these arguments index each element at which 
the function returns a value. An example will make this clearer: 


In 


[x] 


def f(i, j): 

return 2 * i * j 


In [x] 


np.fromfunction(f, (4,3)) 


array([[ 0., 0., 0.], 

[ 0. , 2. , 4.] , 

[ 0 . , 4 . , 8 . ] , 

[ 0 . , 6 . , 12 .] ] ) 


The function f is called for every index in the specified shape and the values it returns 
are used to initialize the corresponding elements. 3 A simple expression like this one can 
be replaced by an anonymous lambda function (see Section 4.3.3) if desired: 

In [x]: np.fromfunction (lambda i,j: 2*i*j, (4,3)) 


Example E6.1 To create a “comb” of values in an array of length N for which every 
nth element is one but with zeros everywhere else: 

In [x] : N, n = 101, 5 
In (x] : def f (i) : 


- We came across linspace in Example E3.1. 

3 Note that the indexes are passed as ndarrays and expect the function, f, to use vectorized operations. 
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...: return (i % n == 0) * 1 

In [x]: comb = np.fromfunction(f, (N,), dtype=int) 

In [x]: print(comb) 

[1 00001000010000100001000010000100001 
000010000100001000010000100001000010 
0001000010000100001000010000 1 ] 


ndarray attributes for introspection 

A NumPy array knows its rank, shape, size, dtype and one or two other properties: 
these can be determined directly from the attributes described in Table 6.1. For 
example, 


In [x] 
In [x] 
Out [x] 
In [x] 
Out [x] 
In [x] 
Out [x] 
In [x] 
Out [x] 
In [x] 
Out [x] 


a = np.array(((1, 0, 1), (0, 1, 0))) 

a.shape 

(2, 3) #2 rows, 3 columns 

a.ndim # rank (number of dimensions) 

2 

a.size # total number of elements 

6 

a.dtype 
dtype('int64') 
a.data 

cmemory at 0x102387308> 


The shape attribute returns the axis dimensions in the same order as the axes are 
indexed: a two-dimensional array with n rows and m columns has a shape of (n, m) . 


6.1.2 NumPy’s basic data types (dtypes) 

So far, the NumPy arrays we have created have contained either integers or floating point 
numbers, and we have let Python take care of the details of how these are represented. 
However, NumPy provides a powerful way of determining these details explicitly using 
data type objects. This is necessary, because in order to interface with the underly- 
ing compiled C code the elements of a NumPy array must be stored in a compatible 


Table 6.1 ndarray Attributes 


Attribute 

shape 

ndim 

size 

dtype 

data 

itemsize 


Description 

The array dimensions: the size of the array along each of its axes, 
retumed as a tuple of integers 

Number of axes (dimensions). Note that ndim == len (shape) 

The total number of elements in the array, equal to the product of the 
elements of shape 

The array’s data type (see Section 6.1.2) 

The “buffer” in memory containing the actual elements of the array 
The size in bytes of each element 
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Table 6.2 Commmon NumPy datatypes 


Data Type 

Description 

int 

The default integer type, corresponding to C’s long: 
platform-dependent 

int8 

Integer in a single byte: -128 to 127 

intl6 

Integer in 2 bytes: -32768 to 32767 

int32 

Integer in 4 bytes: -2147483648 to 2147483647 

int64 

Integer in 8 bytes: —2 63 to 2 63 — 1 

uint8 

Unsigned integer in a single byte: 0 to 255 

uintl6 

Unsigned integer in 2 bytes: 0 to 65535 

uint32 

Unsigned integer in 4 bytes: 0 to 4294967295 

uint64 

Unsigned integer in 8 bytes: 0 to 2 64 — 1 

float_ 

The default floatng point number type, another name for 
float64 

float32 

Single-precision, signed float: ~ 10 -38 to ~ 10 38 with ~ 7 
decimal digits of precision 

float64 

Double-precision, signed float: ~ 10 -308 to ~ 10 308 with 
~ 15 decimal digits of precision 

complex_ 

The default complex number type, another name for 
complexl28 

complex64 

Single-precision complex number (represented by 32-bit 
floating point real and imaginary components) 

complexl2 8 

Double-precision complex number (represented by 64-bit 
floating point real and imaginary components) 

bool_ 

The default boolean type represented by a single byte 


format: that is, each element is represented in a fixed number of bytes that are interpreted 
in a particular way. 

For example, consider an unsigned integer stored in 2 bytes (16 bits) of memory (the 
C-type uintl6_t). Such a number can take a value between 0 and 2 16 — 1 = 65535. 
No equivalent native Python type exists for this exact representation: Python integers 
are signed quantities and memory is dynamically assigned for them as required by their 
size. So NumPy defines a data type object, np. uintl6 to describe data stored in this 
way. 

Furthermore, different Systems can order the two bytes of this number differently, a 
distinction known as endianness. The big-endian convention places the most-significant 
byte in the smallest memory address; the little-endian convention places the least- 
significant byte in the smallest memory address. In creating your own arrays, NumPy 
will use the default convention for the hardware your program is running on, but it 
is essential to set the endianness correctly if reading in a binary file generated by a 
different computer. 
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Table 6.3 Common NumPy data type strings 


String 

Description 

i 

Signed integer 

u 

Unsigned integer 

f 

Floating point number 17 

c 

Complex floating point number 

b 

Boolean value 

S, a 

String (fixed-length sequence of characters) 

U 

Unicode 


a Note that without specifying the byte size, setting dtype='f' creates 


a single-precision floating point data type, equivalent to np . f loat32. 

A full list of the numerical data types 4 is given in the NumPy documentation, 5 
but the more common ones are listed in Table 6.2. They all exist within the numpy 
package and so can be referred to as, for example, np.uinti6. The data types that 
get created by default when using the native Python numerical types are those with a 
trailing underscore: np. f loat_, np . complex and np . bool_. 

Apparently higher-precision floating point number data types such as float96, 
f loatl28 and longdouble are available but are not to be trusted: their implemen- 
tation is platform dependent, and on many Systems they do not actually offer any 
extra precision but simply align array elements on the appropriate byte-boundaries in 
memory. 

To create a NumPy array of values using a particular data type, use the dtype 
argument of any array constructor function (such as np. array, np. zeros, etc.). This 
argument takes either a data type object (such as np.uints) or something that can 
be converted into one. It is common to specify the dtype using a string consisting of 
a letter indicating the broad category of data type (integer, unsigned integer, complex 
number, etc.) optionally followed by a number giving the byte size of the type. For 
example, 

In [x]: b = np.zeros((3,3), dtype='u4') 

creates a 3 x 3 array of unsigned, 32-bit (4-byte) integers (equivalent to np.uint3 2). 
A list of supported data type letters and their meanings is given in Table 6.3. 

To specify the endianness, use the prefixes > (big-endian), < (little-endian) or | 
(endianness not relevant). For example, 

In [x]: a = np.zeros((3,3), dtype='>f8') 

In [x]: b = np.zeros((3,3), dtype='<f') 

In [x]: c = np.empty((3,3), dtype='|S4') 

create arrays of big-endian double-precision numbers, little-endian single-precision 
numbers and four-character strings respecively. 

4 Strictly speaking, these types are array scalar types and not dtypes, but for our use here the distinction is 
not important. 

5 http://docs.scipy.org/doc/numpy/user/basics.types.html. 
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In these examples we have passed a typecode string to an array constructor’s dtype 
argument, but it is also possible to create a dtype object first and pass that instead: 

In [x]: dt = np.dtype('f8') 

In [x] : dt 

dtype('float64') # i.e. 8 bytes, double-precision floating point 

In [x]: a = np.array([0. , 1. 7 -2.], dtype=dt) 

dtype objects have a handful of useful introspection methods: 

In [x] : dt.str # a string identifying the data type 

'<f8' 

In [x]: dt.name # data type name and bit-width 

'float64' 

In [x]: dt.itemsize # data type size in bytes 
8 


To copy an array to a new array with a different data type, pass the desired dtype or 
typecode to the as type method: 


In [x]: a = np.array([1.2345678, 2.5, 3.9]) 

In [x]: a.astype('float32') # cast to single-precision float 

Out[x]: array([ 1.23456776, 2.5 , 3.9000001 ], dtype=float32) 

In [x]: a.astype(np.uint8) # cast to unsigned, 1-byte integer 

Out[x]: array([l, 2, 3], dtype=uint8) 

Strings in NumPy arrays are byte strings of a fixed size: each “character” is rep- 
resented by a single byte, in contrast to the variable size UTF-8 encoding commonly 
used to represent Unicode strings. This is necessary because NumPy arrays have a pre- 
defined, fixed size in which all the elements occupy the same amount of memory so 
that they can be indexed efficiently with a constant stride. Unicode strings encoded 
with UTF-8, however, represent characters as code points with a variable width (see 
Section 2.3.3). Of course, any string is ultimately stored as a sequence of bytes and 
Python provides methods for translating between encodings. For example, on a system 
encoding strings with UTF-8 by default: 

In [x]: s = 'pinata' # UTF-8 encoded Unicode string 

In [x]: b = s.encode() 

In [x] : b 

b'pi\xc3\xblata' # byte string: n is stored in two bytes: hex C3B1 

In [x] : len(s) , len(b) 

(6,7) #6 UTF-8 encoded characters stored in 7 bytes 

In [x]: arr = np.empty((2,2), 'S7') 

In [x]: arr[:] = b # Store the byte string b in array arr 

In [x] : 

array([[b'pi\xc3\xblata', b'pi\xc3\xblata'], 

[b'pi\xc3\xblata', b'pi\xc3\xblata']], 
dtype='| S7' ) 

In [x] : arr [0,0] # returns the byte string 

b'pi\xc3\xblata' 

In [x]: arr[0,0].decode() # decode the byte string back assuming UTF-8 

' pinata' 
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6.1.3 Universal functions (uf unes) 

In addition to the basic arithmetic operations of addition, division and more, NumPy 
provides many of the familiar mathematical functions that the math module 
(Section 2.2.2) does, implemented as so-called universal functions that act on each 
element of an array, producing an array in return without the need for an explicit loop. 
Universal functions are the way NumPy allows for vectorization, which promotes clean, 
efficient and easy-to-maintain code. For example, 

In [x]: x = np.linspace(1,5,5) 

In [x] : x**2 

Out[x]: array([ 1., 4., 9., 16., 25.]) 

In [x] : x - 1 

Out[x]: array([0., 1., 2., 3., 4.]) 

In [x] : np.sqrtfx - 1) 

Out[x]: array([ 0., 1., 1.41421356, 1.73205081, 2.]) 

In [x]: y = np.exp(-np.linspace(0., 2., 5)) 

In [x] : np.sin(x - y) 

Out[x]: array([ 0., 0.98431873, 0.48771645, -0.59340065, -0.98842844]) 

Array multiplication occurs elementwise: matrix multiplication is implemented by 
NumPy’s dot function (or using matrix objects, see Section 6.6): 

In [x]: a = np.array( ((1,2), (3,4)) ) 

In (x] : b = a 

In [x] : a * b # elementwise multiplication 

Out [x] : 

array([[ 1, 4], 

[ 9, 16]]) 

In [x] : a.dot(b) # or np.dot(a, b) 

Out [x] : 

array([[ 7, 10], 

[15, 22]]) 

Comparison and logic operators (~, & and | for not, and and or respectively) are also 
vectorized and resuit in arrays of boolean values: 

In [x]: a = np.linspace(1,6,6)**3 
In [x]: print(a) 

[ 1. 8. 27. 64. 125. 216.] 

In [x]: print(a > 100) 

[False False False False True True] 

In [x]: print((a < 10) | (a > 100)) 

[ True True False False True True] 


6.1.4 NumPy’s special values, nan and inf 

NumPy defines two special values to represent the outcome of calculations, which are 
not mathematically defined or not finite. The value np. nan (“not a number,” NaN) 
represents the outcome of a calculation that is not a well-defined mathematical operation 
(e.g., 0/0); np. inf represents infinity. 6 For example, 


These quantities are defined in accordance with the IEEE 754 Standard for floating point numbers. 
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In 

[x] 

a = np.arange(4) 


In 

[x] 

a /= 0 # [0/0 

1/0 2/0 3 / 0 ] 

In 

[x] 

a 


Out 

[x] 

array([ nan, inf. 

inf, inf]) 


Do not test nans for equality (np.nan == np.nan is False). Instead, NumPy pro¬ 
vides methods np. isnan, np. isinf and np . isf inite: 


In [x] 

Out [x] 

np.isnan(a) 
array([ True, 

False, 

False, 

False] 

dtype=bool) 

In [x] 

Out [x] 

np.isinf(a) 
array([False, 

True, 

True, 

True] 

dtype=bool) 

In [x] 

Out [x] 

np.isfinite(a) 
array([False, 

False, 

False, 

False] 

dtype=bool) 


Note that nan is neither finite nor infinite! (See aiso Section 9.1.4.) 


Example E6.2 A magic square is an N x N grid of numbers in which the entries in 
each row, coiumn and main diagonai sum to the same number (equai to N(N 2 + l)/2). 
A method for constructing a magic square for odd N is as foiiows: 

Step i. Start in the middie of the top row, and iet n = 1; 

Step 2. Insert n into the current grid position; 

Step 3. If n = N 2 the grid is compiete so stop. Otherwise, increment n; 

Step 4. Move diagonaiiy up and right, wrapping to the first coiumn or last row if the 
move leads outside the grid. If this cell is already filled, move vertically down 
one space instead; 

Step 5. Return to step 2. 

The following program creates and displays a magic square. 

Listing 6.1 Creating a magic square 

# Create an N x N magic square. N must be odd. 
import numpy as np 


N = 5 

magic_square = np.zeros((N,N), dtype=int) 
n = 1 

i, j = 0, N//2 

while n <= N**2: 

magic_square[i, j] = n 
n += 1 

newi, newj = (i-1) % N, (j+l)% N 
if magic_square[newi, newj]: 
i += 1 

else: 

i, j = newi, newj 
print(magic_square) 
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The 5x5 magic square output by the earlier example is 

[[17 24 1 8 15] 

[23 5 7 14 16] 

[4 6 13 20 22] 

[10 12 19 21 3] 

[11 18 25 2 9] ] 


6.1.5 Changing the shape of an array 

Whatever the rank of an array, its elements are stored in sequential memory locations 
that are addressed by a single index (intemally, the array is one-dimensional, but know- 
ing the shape of the array, Python is able to resolve a tuple of indexes into a single 
memory address). NumPy’s arrays are stored in memory in C-style, row-major order, 
that is, with the elements of the last (rightmost) index stored contiguously. In a two- 
dimensional array, for example, the element a [ 0, 0 ] is followed by a [ o, l ]. The array 
that follows 

In [x]: a = np.array( ((1,2),(3,4)) ) 

In [x}: print(a) 

[[1 2] 

[3 4] ] 

is stored in memory as the sequential elements [ l, 2,3,4 ] , 7 

f latten and ravel 

Suppose you wish to “flatten” a multidimensional array onto a single axis. NumPy pro¬ 
vides two methods to do this: flatten and ravel. Both flatten the array into its intemal 
(row-major) ordering, as described earlier. flatten retums an independent copy of the 
elements and is generally slower than ravel which, tries to return a view to the flattened 
array. An array view is a new NumPy array with, in this case, a different shape from the 
original, but it does not “own” its data elements: it references the elements of another 
array. Thus, just as with mutable lists (Section 2.4.1), a reassignment of an element of 
one array affects the other. An example should make this ciear: 

In [x] : a = np. array ( [[1,2,3], [4,5,6], [7,8,9]] ) 

In [x]: b = a.flatten() # create and independent, flattened copy of a 

In [x] : b 

Out [x] : array ( [1, 2, 3, 4, 5, 6, 7, 8, 9]) 

In [x] : b [3] =0 
In [x] : b 

Out [x] : array ( [1, 2, 3, 0, 5, 6, 7, 8, 9]) 

In [x]: a # a is unchanged 

Out [x] : 

array ( [ [1, 2, 3] , 

[4, 5, 6] , 

[7, 8, 9]]) 


7 This contrasts with Fortran’s column-major ordering, which would store the elements as [l, 3, 2 ,4]. 
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Assignment to b didn’t change a because they are completely independent objects that 
do not share their data. In contrast, the flattened array created by taking a view on a 
with ravel refers to the same underlying data: 

In [x] : c = a. ravel () 

In [x] : c 

Out [x] : array ( [1, 2, 3, 4, 5, 6, 7, 8, 9]) 

In [x] : c [3] =0 

In [x] : c 

Out [x] : array ([1, 2, 3, 0, 5, 6, 7, 8, 9]) 

In [x] : a 
Out [x] : 

array([ [1, 2, 3] , 

[0, 5, 6] , 

[7, 8, 9]]) 

You should be aware that although the ravel method “does its best” to retum a view to 
the underlying data, various array operations (including slicing-, see Section 6.1.6) can 
leave the elements stored in noncontiguous memory locations in which case ravel has 
no choice but to rnake a copy. 

resize and reshape 

An array may be resized (in place) to a compatible shape 8 with the resize method, 
which takes the new dimensions as its arguments. If the array doesn’t reference another 
array’s data and doesn’t have references to it, resizing to a smaller shape is allowed 
and truncates the array; resizing to a larger shape pads with zeros. Array references are 
created when, for example, one array is a view on another (they share data) or simply 
by assignment: (b=a). 

In [x]: a = np.linspace(1, 4,4) # the array [1. 2. 3. 4.] 

Out[x]: print(a) 

[1. 2. 3. 4.] 

In [x] : a. resize (2,2) # reshapes a in place, doesn't retum anything 

In [x]: print(a) 

[[ 1 . 2 .] 

[ 3. 4.]] 

In [x] : a.resize(3,2) # OK: nothing else references a 

In [x]: print(a) 

[[ 1 . 2 .] 

[3. 4.] 

[ 0 . 0 .]] 

The reshape method returns a view on the array with its elements reshaped as required. 
The original array is not modified. 


In 

[x] 

: a = np.linspace(1 

In 

[x] 

: a.resize(3,2) 

In 

[x] 

: a 

[[ 

1. 

2.] 

[ 

3 . 

4.] 

[ 

0 . 

0.] ] 


8 That is, a shape with the same total number of elements. 
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In [x]: b = a.reshape(6) 

In [x]: print(b) 

[1. 2 . 3. 4 . 0. 0.] 

In [x]: b.resize(3,2) # OK: same number of elements 

In [x]: b.resize(2,2) # not OK: b is a view on (shares) the same data as a 


ValueError: cannot resize this array: it does not own its data 
In [x]: a.resize(2,2) # also not OK: a shares its data with b 

ValueError: cannot resize this array: it does not own its data 


Transposing an array 

The method transpose retums a view of an array with the axes transposed. For a 
two-dimensional array, this is the usual matrix transpose: 


In [x] 

a = np.linspace(1,6,6) 

reshape(3,2) 

In [x] 

a 


Out [x] 
array( 

[[ 1., 2.], 


In [x] 

[ 3. , 4.] , 

[ 5. , 6.] ] ) 

a.transpose() 

# or simply a.T 

Out [x] 
array( 

[[ 1., 3., 5.], 


[2., 4., 6.]] 

Note that transposing a one-dimensional array returns 

In [x] 

b = np.array([100, 101 

102, 103]) 


In [x] 
Out [x] 


b.transpose() 

array([100, 101, 102, 103]) 


The np. matrix object has methods for converting between column and row vectors 
if this is what you want; see also Section 6.1.6. 


Merging and splitting arrays 

A clutch of NumPy methods merge and split arrays in different ways. np.vstack, 
np.hstack and np.dstack stack arrays vertically (in sequential rows), horizontally 
(in sequential columns) and depthwise (along a third axis). For example, 


In [x]: a = np.array([0, 0, 0, 0]) 

In [x]: b = np.array([1, 1, 1, 1]) 

In [x]: c = np.array([2, 2, 2, 2]) 

In [x]: np.vstack((a,b,c)) 

Out[x]: 


array( 

[ [0, 

0, 0, 0] , 


[1, 

1, 1, 1] , 


[2, 

2, 2, 2]] ) 

In [x] : 

Out[x]: 

: np 

.hstack((a,b,c)) 

array( 

[0, 

0, 0, 0, 1, 1, 1 


In [x]: np.dstack((a,b,c)) 
Out[x]: 

array([ [ [0, 1, 2], 
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[ 0 , 

1 , 

2 ] , 

[ 0 , 

1 , 

2 ] , 

[ 0 , 

1 , 

2 ] 1 1 ) 


Note that the array created contains an independent copy of the data from the original 
arrays. 9 

The inverse operations, np. vsplit, np .hsplit and np. dsplit split asingle array 
into multiple arrays by rows, columns or depth. In addition to the array to be split, 
these methods require an argument indicating how to split the array. If this argument 
is a single integer, the array is split into that number of equal-sized arrays along the 
appropriate axis. For example, 


In [x] 
In [x] 
Out [x] 
In [x] 
Out [x] 


a = np.arange(6) 
a 

array([0, 1, 2, 3, 4, 5]) 

np.hsplit(a, 3) 

[array([ 0, 1]), array([ 2, 3]), array([ 4, 5])] 


- a list of array objects is retumed. If the second argument is a sequence of integer 
indexes, the array is split on those indexes: 

In [x] : a 

Out[x]: array([ 0, 1, 2, 3, 4, 5]) 

In [x]: np.hsplit(a, (2, 3, 5)) 

[array([0, 1]), array([2]), array([3, 4]), array([5])] 


- this is the same as the list [a [: 2 ] , a [2 :3] , a [3 : 5 ] , a [5 : ] ]. Unlike with 
np. hstack, etc., the arrays returned are views on the original data. 10 


Example E6.3 Suppose you have a 3 x 3 array to which you wish to add a row or 
column. Adding a row is easy with np . vstack: 

In [x] : a = np. ones ( (3 , 3) ) 

In [x] : np.vstack( (a, np.array((2,2,2))) ) 

Out [x] : 

array([[ 1., 1., 1.], 

[ 1 . , 1 . , 1 .] , 

[ 1 . , 1 . , 1 .] , 

[2., 2., 2.]]) 

Adding a column requires a bit more work, however. You can’t use np. hstack directly: 

In [x] : a = np. ones ( (3 , 3) ) 

In [x] : np.hstackt (a, np.array( (2,2,2))) ) 

... [Traceback informationi ... 

ValueError: ali the input arrays must have same number of dimensions 


9 NumPy has to copy the data because it has to store its data in one contiguous block of memory and the 
original arrays may be dispersed in different noncontiguous locations. 

10 NumPy does this for efficiency reasons - copying large amounts of data is expensive and not necessary to 
fulfill the function of these splitting methods. 
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This is because np . hstack cannot concatenate two arrays with different numbers of 
rows. Schematically: 

[ [ 1. , 1., 1.], [2., 2., 2.] 

[ 1 . , 1 . , 1 .] , + = ? 

[ 1 ., 1 ., 1 .]] 

We can’t simply transpose our new row, either, because it’s a one-dimensional array and 
its transpose is the same shape as the original. So we need to reshape it first: 

In [x] : a = np. ones ( (3 , 3) ) 

In [x] : b = np. array ( (2,2,2) ) . reshape (3,1) 

In [x] : b 
array ( [ [2] , 

[2] , 

[2] ] ) 

In [x] : np.hstack((a, b) ) 

Out [x] : 

array([[ 1., 1., 1., 2.], 

[1., 1., 1., 2.], 

[1., 1., 1., 2.]]) 


6.1.6 Indexing and slicing an array 

An array is indexed by a tuple of integers, and as for Python sequences negative indexes 
count from the end of the axis. Slicing and striding is supported in the same way as well. 
For one-dimensional arrays there is only one index: 

In [x]: a = np.linspace(1,6,6) 

In [x]: print(a) 

[ 1. 2. 3. 4. 5. 6.] 

In [x] : a [1:4:2] # elements a[l] and a [3] (a stride of 2) 

Out[x]: array([ 2 . , 4.]) 

In [x] : a [3::-2] # elements a [3] and a [1] (a stride of -2) 

Out[x]: array([ 4 . , 2 .] 

Multidimensional arrays have an index for each axis. If you want to select every item 
along a particular axis, replace its index with a single colon: 

In [x]: a = np.linspace(1,12,12).reshape(4,3) 

In [x] : a 
Out [x] : 

array([[ 1., 2., 3.], 

[ 4 . , 5. , 6.] , 

[ 7. , 8. , 9.] , 



[ 10. 

11 ., 

12.]] ) 


In [x] 

a [3, 

ii 




Out [x] 

11.0 





In [x] 

a [2, 

:] 

# 

everything in 

the third row 

Out [x] 






array( 

7., 

8 . , 9 

] ) 



In [x] 

a [ : , 

1] 

# 

everything in 

the second column 

Out [x] 

array([ 2. 

5 . 

8., 11.]) 


In [x] 

a [1: 

-1, 1:] 

# 

middle rows. 

second column onwards 

Out [x] 
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1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


a [ 2, : ] 

(a) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


a[::2, : ] 
(d) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


a [:, 1 ] 
(b) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


a[2:,:2] 
(e) 


Figure 6.1 Various ways to slice a NumPy array. 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 


a[1:-1,1: ] 
(c) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

a [1: 

: 2, 

: : 2 ] 


(f) 


array([[ 5., 6 . ] , 

[ 8., 9.]]) 

These and further examples of NumPy array slicing are illustrated in Figure 6.1. 

The special ellipsis notation (...) is useful for high-rank arrays: in an index, it rep- 
resents as many colons as are necessary to represent the remaining axes. For example, 
for a four-dimensional array, a [ 3, l, ...Jis equivalent to a [ 3, l, :, : ] and 
a [3 , . . . , 1] is equivalent to a [3 , 1]. 

The colon and ellipsis syntax also works for assignment: 

In [x] : a[:,1] = 0 # set all elements in the second column to zero 

In [x] : print (a) 

[[1. 0. 3.] 

[ 4. 0. 6.] 

[ 7. 0. 9.] 

[ 10 . 0 . 12 .]] 


Advanced indexing 

NumPy arrays can also be indexed by sequences that aren’t simple tuples of integers, 
including other lists, arrays of integers and tuples of tuples. Such “advanced indexing” 
creates a new array with its own copy of the data, rather than a view: 

In [x]: a = np.linspace(0.,0.5,6) 

In [x]: print(a) 
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[0. 0.1 0.2 0.3 0.4 0.5] 

In [x] : ia = [1, 4, 5] # a list of indexes 

In [x] : print (a [ia] ) 

[0.1 0.4 0.5] 

In [x]: ia = np.array( ((1,2), (3,4)) ) 

In [x]: print(a[ia]) # an array to be formed from the specified indexes 

[[ 0.1 0 . 2 ] 

[0.3 0.4]] 

One can even index a multidimensional array with multidimensional arrays of indexes, 
picking off individual elements at will to build an array of a specified shape. This can 
lead to some rather baroque code: 


In 

[x] 

a = 

np.linspace(1, 

12,12).reshape 

In 

[x] 

print(a) 



[[ 

1. 

2 . 

3.] 



[ 

4 . 

5 . 

6.] 



[ 

7 . 

8 . 

9.] 



[ 

10 . 

11. 

12.] ] 



In 

[x] 

ia 

= np.array( 

( (1 

,0) , (2,1) ) ) 

In 

[x] 

ja 

= np.array( 

( (0 

,1) , (1,2)) ) 

In 

[x] 

print(a[ia,j a]] 




[[ 4. 2.] 

[ 8 . 6 .]] 

Here we build a 2 x 2 array (the shape of the index arrays) whose elements are a [ l, 0 ] , 
a [ 0,1 ] on the top row and a [ 2,1 ] , a [ 1,2 ] on the bottom row. 

Instead of indexing an array with a sequence of integers, it is also possible to use an 
array of boolean values. The True elements of this indexing array identify elements in 
the target array to be returned: 


In [x]: a = np.array([-2,-1,0,1,2]) 

In [x]: ia = np.array([False, True, False, True, True]) 
In [x]: print(a[ia]) 

[-1 1 2 ] 


Because comparisons are vectorized across arrays just like mathematical operations, 
this leads to some useful shortcuts: 

In [x]: print(a) 

[-2 -1012] 

In [x]: ib = a < 0 
In [x]: print(ib) 

[ True True False False False] 

In [x]: a[ib] =0 # set all negative elements to zero 

In [x]: print(a) 

[0 0 0 1 2 ] 

It is not actually necessary to store the intermediate boolean array, ib, and a [a<0] =0 
does the same job: 

In [x]: a = np.array([-2,-1,0,1,2]) 

In [x] : a [a<0]=0 
In [x]: print(a) 

[0 0 0 1 2 ] 
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The boolean operations not, and and or are implemented on boolean arrays with the 
operators & and | respectively. For example, 

In [x] : years = array( [ 1900 , 1904, 1990, 1993, 2000, 2014, 2016, 2100]) 

In [x]: leap_year = (years % 400 == 0) | (years % 4 == 0) & -(years % 100 == 0) 

In [x]: print(list(zip(years, leap_year))) 

Out[x]: [(1900, False), (1904, True), (1990, False), (1993, False), 

(2000, True), (2014, False), (2016, True), (2100, False)] 


Adding an axis 

To add an axis (i.e., dimension) to an array, insert np.newaxis in the desired 
position: 

In [x]: a = np.linspace(1, 4, 4).reshape(2, 2) 

In [x]: print(a) # a 2x2 array (rank=2) 

[[ 1 . 2 .] 

[ 3. 4.]] 

In [x]: a.shape() 

( 2 , 2 ) 

In [x]: b = a[:, np.newaxis, :] 

In [x]: print(b) # a 2x1x2 array (rank=3) 

[[[ 1 . 2 .]] 

[[ 3. 4.]]] 

In [x]: b.shape 

( 2 , 1 , 2 ) 

In fact, np. newaxis is the None object, so None can be used directly in its place if 
desired. 


Example E6.4 A Sudoku square consists of a 9 x 9 grid with entries such that each 
row, column and each of the 9 nonoverlapping 3x3 tiles contains the numbers 1-9 
once only. The following program verifies that a provided grid is a valid Sudoku 
square. 

Listing 6.2 Verifying the validity of a Sudoku square 


import numpy as np 
def check_sudoku(grid): 

""" Return True if grid is a valid Sudoku square , otherwise False. " 
for i in range(9): 

# j , k index the top left-hand comer of each 3x3 t ile 
j , k = (i // 3) * 3, (i % 3) * 3 

O if len(set(grid[i,: ])) != 9 or len(set(grid [ :,i])) != 9\ 

or len(set(grid[j:j+3, k:k+3].ravel())) != 9: 

return False 
return True 

sudoku = "" "145327698 
839654127 
672918543 
496185372 
218473956 
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753296481 
367542819 
984761235 
521839764""" 

# Tum the provided string, sudoku, into an integer array 

grid = np.array([[int(i) for i in line] for line in sudoku.split()]) 

print(grid) 

if check_sudoku(grid): 
print('grid valid') 

else: 

print('grid invalid') 

O Here we use the fact that an array of length 9 contains nine unique elements if the 
set formed from these elements has cardinality 9. No check is made that the elements 
themselves are actually the numbers 1-9. 


Meshes 

To evaluate a multidimensional function on a grid of points, a mesh is useful. The 
function np.meshgrid is passed a series of N one-dimensional arrays representing 
coordinates along each dimension and returns a set of N-dimcnsional arrays comprising 
a mesh of coordinates at which the function can be evaluated. For example, in the two- 
dimensional case: 

In [x]: x = np.linspace(0, 5, 6) 

In [x]: y = np.linspace(0, 3, 4) 

In [x] : x, Y = np.meshgrid(x, y) 

In [x] : X 
Out [x] : 


array([ 

[ 0., 

1 - , 

2 . , 

3 . , 

4 . , 

5.] , 


[ 0., 

1 - , 

2 . , 

3 . , 

4 . , 

5.] , 


[ 0. , 

1 - , 

2 . , 

3 . , 

4 . , 

5.] , 


[ 0., 

1 - , 

2 . , 

3 . , 

4 - , 

5.] ] ) 

In [x] : 

Y 






Out[x]: 







array( [ 

[ 0., 

0 . , 

0 . , 

0 . , 

0 - , 

0.] , 


[ 1., 

1 - , 

1 - , 

1 • / 

1 • , 

1.] , 


[ 2 ., 

2 . , 

2 . , 

2. , 

2 . , 

2.] , 


[ 3. , 

3 . , 

3 . , 

3 . , 

3 . , 

3.]]) 


The arrays x and Y can each be indexed with indexes i, j: the x array is repeated as 
rows down x and the y array as columns across Y. A function of two coordinates can 
therefore be evaluated on the grid as simply f (x, Y). 

Setting the optional argument sparse to True will return sparse grid to conserve 
memory. In the previous example, instead of two arrays, both with shapes (6, 4), 
arrays with shapes (1,6) and (4, l) that can be broadcast against each other (see 
Section 6.1.7) will be returned: 

In [X]: X, Y = np.meshgrid(x, y, sparse=True) 

In [X] : X 

Out[X] : array( [ [ 0., 1., 2., 3., 4., 5.]]) 
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In [X] : Y 
Out [X] : 
array([[ 0.], 

[ 1 .] , 

[ 2 .] , 

[ 3.]]) 


6.1.7 0 Broadcasting 

We have already seen that simple operations such as addition and multiplication can be 
carried out elementwise on two arrays of the same shape ( vectorization ): 


In [x] 

a = np.array([1, 

2 , 3 ]> 

In [x] 

b = np.array([0, 

10 , 100 

In [x] 

a * b 


Out [x] 

array([ 0, 20, 

300] ) 


Broadcasting describes the rules that NumPy uses to carry out such operations when 
the arrays have different shapes. This allows the operation to be carried out using 
precompiled C loops instead of slower, Python loops, but there are constraints as to 
which array shapes can be broadcast against each other. The rules are applied on each 
dimension of the arrays, starting with the last and working backward. Two dimensions 
compared in this way are said to be compatible if they are equal or one ofthem is 1. 

The simplest example of broadcasting involves the operation between an array and 
a scalar (which may be considered for this purpose to be a one-dimensional array of 
length 1). Consider 

In [x] : a = np. array ([ [1, 2, 3], [4, 5, 6]]) 

In [x] : b = 2 
In [x] : c = a * b 
In [x] : c 
Out [x] : 

array( [ [ 2, 4, 6] , 

[ 8 , 10 , 12 ]]) 

The dimensions of a and b are compatible: 

a: 2x3 

b: 1 

C: 2x3 

Here, b can be broadcast across the two dimensions of array a by repetition of its value 
for every element in that array. Similarly, an array of shape (3,) can be broadcast across 
both rows of a: 


In [x] : 

b = 

np.array([1 

In [x] : 

a*b 


Out [x] : 
array([ 

1 , 

4, 9] , 


[ 4, 

10, 18]]) 

a: 

2 x 

3 

b: 


3 

C : 

2 x 

3 
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That is, for each row of a, its entries are multiplied by the corresponding entries of 
the one-dimensional array b. However, attempting to multiply a by an array whose last 
dimension is not 1 or 3 is a ValueError here: 

In [x]: b = np.array([1,2]) 

In [x] : a * b 


-> 1 a * b 


ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

In the example of the sparse mesh created in the previous section, the arrays with 
shapes (1,6) and (4,1) are compatible. For example, 


In [x] : f = X*Y 
Out [x] : f 


array([[ 

0 . , 

0 . , 

0 . , 

0 . , 

0 . , 

0 .], 

[ 

0 . , 

1 -, 

2 . , 

3 - , 

4 . , 

5 .], 

[ 

0 . , 

2 . , 

4 . , 

6 . , 

8 . , 

10 .], 

[ 

0 . , 

3 - , 

6 . , 

9 - , 

12 . , 

15.]]) 


The broadcasting process “stretches out” the second axis of Y from 1 to 6 to match that 
of x and the first axis of x from 1 to 4 to match that of y: 

X: 1x6 

Y: 4x1 

f : 4x6 

To force a broadcast on an array with insufficient dimensions to meet your require- 
ments, you can always add an axis with np. newaxis. For example, one way to take the 
outer product of two arrays is by adding a dimension to one of them and broadcasting 
the multiplication: 


In [x] : a = np. array ( [1, 2, 3] ) 

In [x] : b = np.array([0, 10, 100]) 
In [x] : c = a[:, np. newaxis] * b 
In [x] : c 
Out [x] : 


array( [ [ 

o, 

20 , 

300] 

[ 

0 , 

40, 

600] 

[ 

0 , 

60, 

900] 


Thus, instead of matching elements in the two arrays with shapes (3,), the extra 
axis on a creates an array with shape (3,1) and this dimension is stretched across the 
array b: 


a[:,np.newaxis]: 3x1 

b: 3 

C: 3x3 
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6.1.8 Maximum and minimum values 

NumPy arrays have the methods min and max, which return the minimum and maxi¬ 
mum values in the array. By default, a single value for the flattened array is retumed; to 
find maximum and minimum values along a given axis, use the axis argument: 

In [x] : a = np. array ( [ [3, 0, -1, 1], [2, -1, -2, 4], [1, 7, 0, 4]]) 

In [x]: print(a) 


[[ 3 

0 

-1 1] 



[ 2 

1 

-2 4] 



[ 1 

7 

0 4] ] 



In [x] 


a.min() 

# 

"global" minimum 

Out [x] 


-2 



In [x] 


a.max() 

# 

"global" maximum 

Out [x] 


7 



In [x] 


print( a 

min(axis=0) ) 


[1-1-2 1] # minima in each column 

In [x]: print( a.max(axis=l) ) 

[347] # maxima in each row 

Often one wants not the maximum (or minimum) value itself but its index in the array. 
This is what the methods argmin and argmax do. By default, the index returned is into 
th e flattened array, so the actual value can be retrieved using a view on the array created 
by ravel: 

In [x] : a.argmin () 

6 

In [x]: a.ravel()[a.argmin()] 

-2 

In [x]: print(a.argmax(axis=0)) 

[0221] # row indexes of maxima in each column 

In [x]: print(a.argmax(axis=l)) 

[031] # column indexes of maxima in each row 

Figure 6.2 illustrates the process for axis=o and for axis=l. Notice that if more 
than one equal maximum exists in a column, the index of the first is returned. 


Example E6.5 Consider the following oscillating functions on the interval [0, L\: 

2 7tx 2 L 

f n (x)=x(L-x) sin-—; X n =—, n = l,2,3, •••. 

The following code delines a two-dimensional array holding values of these functions 
for L = 1 on a grid of N = 100 points (rows) for n = 1,2, ••• ,5 (columns). 
The position of the maximum and minimum in each column is calculated with 
argmax (axis = 0) and argmin (axis = 0). (See Figure 6.3.) 

Listing 6.3 argmax and argmin 

# eg6 - array_maxmin. py 
import numpy as np 
import pylab 


N = 100 
L = 1 
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(a) axis=0 (b) axis=l 

Figure 6.2 (a) a . max (axis=0) giving the maximum values and a . argmax (axis = 0) 
giving the indexes of the maximum values of each column in array a (that is, maintaining the 
row dimension) and (b) The same for axis = l: maximum values along each row. 


def f(i, n): 

x = i * L / N 
lam = 2*L/(n+1) 

return x * (L-x) * np.sin(2*np.pi*x/lam) 

a = np.fromfunetion(f, (N+1, 5)) 

min_i = a.argmin(axis=0) 
max_i = a.argmax(axis=0) 
pylab.plot(a, c= 7 k 7 ) 

pylab.plot(min_i, a[min_i, np.arange(5)], 'v', c= 7 k 7 , markersize=10) 

pylab.plot(max_i, a[max_i, np.arange(5)], /A/ , c= 7 k 7 , markersize=10) 

pylab.xlabel(r 7 $x $ 7 ) 
pylab.ylabel(r 7 $f_n(x)$ 7 ) 
pylab.show() 


6.1.9 Sorting an array 

NumPy arrays can be sorted in several different ways with the sort method, which 
orders the numbers in an array in place. By default, this method sorts multidimensional 
arrays along their last axis. To sort along some other axis, set the axis argument. For 
example, 

In [x] : a = np.array([5, -1, 2, 4, 0, 4]) 

In [x] : a.sort () 
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Figure 6.3 Maxima and minima of the functions/„(-*) described in Example E6.5. Note that only 
the “global” maximum and minimum are returned for each function, and that where more than 
one point has the same maximum or minimum value, only the first is returned. 


In [x] : print (a) 

[-1 0 2 4 4 5] 

In [x] : b = np.array( [[0, 3, -2] , [7, 1, 3] , [4, 0, -1]]) 

In [x] : print (b) 

[[03 - 2 ] 

[713] 

[4 0 -1] ] 

In [x] : b.sortO # sort the numbers along each row 

In [x] : print (b) 

[[-2 0 3] 

[13 7] 

[-1 0 4]] 

This is the same as b. sort (axis=l) - “for each row, order the numbers by column.” 
To sort the numbers in each column - “for each column, order the numbers by row,” set 
axis=0: 

In [x] : b .np. array ([ [0, 3, -2], [7, 1, 3], [4, 0, -1]]) 

In [x]: b.sort(axis=0) # sort the numbers along each column 

In [x] : print (b) 

[[0 0 - 2 ] 

[4 1 -1] 

[7 3 3]] 

The sorting algorithm used is the “quicksort” algorithm, which is a good general- 
purpose choice. 11 


11 Some arrays can be sorted faster with the alternative mergesort or heapsort algorithms; these can be 
selected by setting the optional kind argument to the string literal values ' mergesort' and ' heapsort', 
for example: b. sort (axis=l, kind=' heapsort' ). 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:20, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.101 7/CB09781 1 39871 754.006 





208 


NumPy 


Two other sorting functions are worth mentioning. np. argsort retums the indexes 
that would sort an array rather than the sorted elements themselves: 

In [x]: a = np.array([3, 0, -1, 1]) 

In [x]: np.argsort(a) 

Out[x]: array([2, 1, 3, 0]) 

Therefore, 

In [x]: a[np.argsort(a)] 

Out[x]: array([-1, 0, 1, 3]) 

np. argsort also takes the axis and kind arguments previously described. 

The method np . searchsorted takes a, sorted array, a and one or more values, v, 
and retums the indexes in a at which the values should be entered to maintain its order: 

In [x]: a = np.array([1, 2, 3, 4]) 

In [x]: np.searchsorted(a, 3.5) 

Out [x] : 3 

In [x]: np.searchsorted(a, (3.5, 0, 1.1)) 

Out [x] : array ( [3, 0, 1] ) 


6.1.10 Structured arrays 

Also known as record arrays, structured arrays are arrays consisting of rows of values 
where each value may have its own data type and name. These rows are the “records.” 
This is very much like a table of data with rows (records) consisting of values that fall 
into columns (fields) and provides a very convenient and natural way to manipulate 
scientilic data that is often obtained or presented in tabular form. 

Creating a structured array 

The structure of a record array is defined by its dtype using a more complex syntax 
than we have used previously. For example, 

In [x]: a = np.zeros(5, dtype= / int8 / float32, complex_') 

In [x]: print (a) 

[(0, 0.0, Oj) (0, 0.0, Oj) (0, 0.0, Oj) (0, 0.0, Oj) (0, 0.0, 0j)] 

In [x]: a.dtype 

dtype([('fO', '|il'), ('fl', '<f4'), ('f2', '<cl6 ')]) 

Here we have created an array of five records, each of which has three fields, defined by 
constructing a dtype specified by the string ' int8 , f loat32 , complex_'. 

• The first field is a single-byte, signed integer (int8 which is described by the 
string ' | i l ' - clearly the endianness (byte order) is not relevant in a one-byte 
quantity); 

• The second is a single-precision floating point number (which on my system) is 
stored in memory as a little-endian 4-byte sequence, indicated by ' <f 4'; 

• The final field is defined to be a complex number to default precision, which 
on my system is stored in 16-bytes, little-endian (complex_ is equivalent to 
complexl2 8 which corresponds to a data type ' <c!6 ')• 
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Because we did not explicitly name the fields, they are given the default names ' f o', 
' f l' and ' f 2'. To name the fields of our structured array explicitly, pass the dtype 
constructor a list of (name, dtype descriptor) tuples: for example, 

In [x]: dt = np.dtype( [('time', 'f8'), ('signal', 'i4')] ) 

In [x]: a = np.zeros(10, dtype=dt) 

In [x] : a 
Out [x] : 

array([(0.0, 0), (0.0, 0), ..., (0.0, 0)], 

dtype=[('time', '<f8'), ('signal', '<i4')]) 

A structured array can therefore be visualized as a table of data values with column 
headings for each field. 

Assigning records in a structured array is as expected: 


In [x] 

In [x] 

: a[0] = 

: a[1:3] 

(0., 4) 

= [(0.5, -3) 

(1., -5)] 


In [x] : a 

Out [x] : 

array([(0.0, 4) 

, (0.5, -3), 

(1.0, -5), . 

.., (0.0, 


dtype=[(' 

time', '< f 8' 

, ('signal', 

'<i4')]) 


but the real power of this approach is in the ability to reference a field by its name. For 
example, to set the ' time' column in our array to a linear sequence: 

In [x]: a['time'] = np.linspace(0., 4.5, 10) 

In [x]: print (a) 

[(0.0, 4) (0.5, -3) (1.0, -5) (1.5, 0) (2.0, 0) (2.5, 0) (3.0, 0) (3.5, 0) 

(4.0, 0) (4.5, 0)] 

In [x]: print (a['time'][-1]) 

4.5 

Likewise, to obtain a view on a column, refer to it by name: 

In [x]: print (a['time']) 

[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5] 

In [x] : print ( a ['signal'] .min() ) 

-5 


More ways to create a structured array 

There are several (arguably, too many) ways to define the dtype describing a structured 
array. So far we have used a string of comma-separated identifiers and a list of tuples. A 
third way is to use a dictionary. The basic usage assigns a list of values to the two keys 
' names' and ' f ormats' naming the fields and specifying their formats respectively: 

In [x]: dt = np.dtype({ 'names': ['time', 'signal'], 

'formats':['f8', 'i4'] 

}> 

In [x]: a = np.zeros(10, dtype=dt) 

defines the same structured array of (time, signal) records as before. A third key, 
'tities', can be used to give each field a more detailed description; each title can 
then be used as an alias to its name in referring to that field in the array. 12 


12 In fact, title can be any Python object and can be used to provide detailed “metadata” concerning the 
corresponding field. 
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In 

[x] : 

dt = np. 

dtype({'names': ['candidate', 'mark', 'grade' 

'formats': ['|S50', 'ul', '|S2'], 

'tities': ['Candidate Name', 'Percentage Mark 

In 

[x] : 

a = np.zeros(10, dtype=dt) 


In 

[x] : 

a[0] = ( 

'John Brown', 64, 'B-') 


In 

[x] : 

a[l] = ( 

'Jane Smith', 78, 'A') 


In 

[x] : 

print (a[ 

'Candidate Name']) 


[b'John 

Brown' b 

'Jane Smith' b'' b'' b'' b'' b'' b'' 

b' ' b''] 

In 

[X] : 

print (a[ 

'Percentage Mark']) 


[64 

78 

0 0 0 

0 0 0 0 0] 



Sorting structured arrays 

Structured arrays can be sorted by giving a specific order to the fields used with the 
order argument. For example, with the following structured array: 

In [x]: data = [ ('NiCd', 1.2, 0.14, 2000), 

('Lead acid', 2.1, 0.14, 700), 

('Lithium ion', 3.6, 0.46, 800) ] 

In [x] : dtype = [ ('name', ' |S2 0'), 

('voltage', 'f8'), 

('specific energy', 'f8'), 

('cycle durability', 'i4') ] 

In [x]: a = np.array(data, dtype=dtype) 

In [x]: a.sort(order='specific energy') 

In [x]: print(a) 

[(b'Lead acid', 2.1, 0.14, 700) (b'NiCd', 1.2, 0.14, 2000) 

(b'Lithium ion', 3.6, 0.46, 800)] 

In [x]: a.sort(order=['specific energy', 'voltage']) 

In [x]: print(a) 

[(b'NiCd', 1.2, 0.14, 2000) (b'Lead acid', 2.1, 0.14, 700) 

(b'Lithium ion', 3.6, 0.46, 800)] 


The second sort operation here sorts the records by specific energy, and if this is the 
same for two or more records, then it sorts by voltage. 


6.1.11 Arrays as vectors 

Although NumPy provides a matrix class that specializes ndarray to make linear 
algebra calculations easier and can be used to represent vectors, for many purposes it 
is just as convenient to detine a vector with n components as a regular one-dimensional 
array with n elements. 

In addition to elementwise operations such as vector addition, subtraction and so 
on, NumPy array objects implement scatar (dot) product and vector (cross) product 
methods: 

In [x]: a = np.array([l, 0, -3]) # vector as a one-dimensional array 

In [x]: b = np.array([2, -2, 5]) 

In [x] : a.dot(b) # or b.dot(a) or np.dot(a,b) 

Out[x]: -13 

In [x]: np.cross(a, b) 

array([ -6, -11, -2]) 
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You can only take the cross product of an array with two or three elements; the third 
component is assumed to be zero in the former case. To use dot and cross on two 
individual vectors, ensure that they are row vectors as described previously and not 
column vectors represented as an (n,l) array: 


In 

[x] 

a 

= np.array([[1 

In 

[x] 

b 

= np.array([[2 

In 

[x] 

print (a) 

[[ 

1] 



[ 

0 ] 



[- 

3] ] 



In 

[x] 

np 

. dot (a,b) 


[0] , [-3]]) # 3x1 two-dimensional array 

[-2], [5]]) 


tries matrix multiplication: won't work 


ValueError: objects are not aligned 

If you do want to take the dot product of two column vectors using np. dot, they need 
to be tumed into row vectors: 

In [x]: np.dot(a.T[0], b.T[0]) # t ranspose to row vectors 

Out[x]: -13 

This is a bit tortuous: the index is needed because the transpose of our (n, l) (two- 
dimensional) array is a (l,n) array from which we want the first and only row for 
our vector. Alternatively, we can operate using a flattened view of the column vectors 
obtained with ravel: 

In [x]: a.ravel().dot(b.ravel()) 

Out[x]: -13 

See also Section 6.6. 


6.1.12 Logic and comparisons 

NumPy provides a set of methods for comparing and performing logical operations on 
arrays elementwise. The more useful of these are summarized in Table 6.4. 


Table 6.4 ndarray Attributes 


Function 
np.all(a) 
np.any(a) 
np.isreal(a) 
np.iscomplex(a) 

np.isclose(a, b) 

np.allclose (a, b) 


Description 

Determine whether all array elements of a evaluate to True. 
Determine whether any array element of a evaluates to True. 
Determine whether each element of array a is real. 

Determine whether each element of array a is a complex 
number. 

Return a boolean array of the comparison between arrays a and 
b for equality within some tolerance. 

Return a True if all the elements in the arrays a and b are equal 
to within some tolerance. 
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np. al 1 and np . any work the same as Python’s built-in functions of the same name 13 
(see Section 2.4.3): 

In [x]: a = np.array([[1, 2, 0, 3], [4, 0, 1, 1]]) 

In [x]: np.any(a), np.all(a) 

Out[x]: (True, False) # Some (but not all) elements are equivalent to True 


np. isreal and np. is complex return boolean arrays: 


In [x] 

b = np.array([1, -lj 

0 .5j , 

0 , 1-2 

5jl> 


In [x] 

np.isreal(b) 





Out [x] 

array([ True, False, 

False, 

True, 

False] 

dtype=bool) 

In [x] 

np.iscomplex(b) 





Out [x] 

array([False, True, 

True, 

False, 

True] 

dtype=bool) 


Because the representation of floating point numbers is not exact, comparing two 
f loat or complex arrays with the == operator is not always reliable and is not recom- 
mended. Instead, the best we can do is see if two values are “close” to one another within 
some (typically small) absolute or relative tolerance - NumPy provides the function 
np. isclose (a, b) for elementwise comparisons of two arrays: it retums True for 
elements satisfying 

abs(a-b) <= (atol + rtol * abs(b)) 

with absolute tolerance, atol and relative tolerance, rtol which are 1(1 3 and 10~ 5 
respecively by default but can be changed by setting the corresponding arguments. 14 
An additional argument, equal_nan, defaults to False, meaning that nan values in 
corresponding positions in the two arrays are treated as different; to treat such elements 
as equal, set equal_nan=True. 

In [x]: a = np.array([1.66e-27, 1.38e-23, 6.63e-34, 6.02e23, np.nan]) 

In [x]: b = np.array([1.66e-27, 1.66e-27, 1.66e-27 # 6.00e23, np.nan]) 

In [x] : np.isclose(a, b) 

Out[x]: array([ True, True, True, False, False], dtype=bool) 

In [x]: np.isclose(a, b, equal_nan=True) 

Out[x]: array([ True, True, True, False, True], dtype=bool) 

Note that small numbers compare as equal even though they may differ by many orders 
of magnitude - to correct this, set atol = 0 to compare within relative tolerance only: 

In [x]: np.isclose(a, b, atol=0) 

Out[x]: array([ True, False, False, False, False], dtype=bool) 

Finally, allclose (a, b) returns a single value: True only if every element in a is 
equal to the corresponding element in b (within the tolerance defined by atol and 
rtol), and otherwise False. 

In [x]: x = np.linspace(0, np.pi, 100) 

In [x]: np.allclose(np.sin(x)**2, 1 - np.cos(x)**2) 

Out[x]: True 


^ Except that they don’t work on generator or iterator objects. 

14 Note that this relation is not symmetric in a and b, so it is possible that isclose (a, b) may not equal 
isclose(b, a). 
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6.1.13 Exercises 
Questions 


Q6.1.1 What is the difference between the objects np . ndarray and np. array? 
Q6.1.2 Why doesn’t this create a two-dimensional array? 

>>> np.array((1,0,0), (0,1,0), (0,0,1), dtype=float) 

What is the correct way? 

Q6.1.3 What is the difference, if any, between the following statements: 

>>> a = np.array([0,0,0]) 

>>> a = np.array([[0,0,0]]) 

Q6.1.4 Explain the following behavior: 

In [x]: a, b = np.zeros((3,)), np.ones((3,)) 

In [x]: a.dtype = 'int' 

In [x] : a 

Out[x]: array([0, 0, 0]) 

In [x]: b.dtype = 'int' 

In [x] : b 

Out[x] : array( [4607182418800017408, 4607182418800017408, 4607182418800017408]) 

What is the correct way to convert an array of one data type to an array of another? 
Q6.1.5 A 3 x 4 x 4 array is created with 

In [x]: a = np.linspace(1,48,48).reshape(3,4,4) 

Index or slice this array to obtain the following: 

a. 2 o.o 

b. [ 9 . 10. 11. 12 . ] 


c. 

The 4x4 

array: 



[ [ 33 . 34 

. 35 . 

36 


[ 37. 38 

. 39 . 

40 


[41. 42 

. 43 . 

44 


[ 45. 46 

47 . 

48 

d. 

The 3x2 

array: 



[[ 5. , 6.] , 

[ 21 . , 22 .] , 
t 37. , 38.] ] 


e. The 4x2 array: 

[ t 36. 35.] 

[ 40. 39. ] 

[ 44. 43.] 

[ 48. 47.]] 
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f. The 3x4 array: 


13 . 

9. 

5 . 

1 .] 

29 . 

25 . 

21 . 

17.] 

45 . 

41. 

37 . 

33.]] 


g. (Harder) Using an array of indexes, the 2 x 2 array: 

[[ 1. 4.] 

[ 45. 48.]] 


Q6.1.6 Write an expression, using boolean indexing, which retums only the values 
from an array that have magnitudes between 0 and 1. 


Q6.1.7 Why does the following statement evaluate to True even though the two num- 
bers passed to np. isclose () differ by more than atol? 

In [x]: np.isclose(-2.00231930436153, -2.0023193043615, atol=l.e-14) 

Out[x]: True 


Q6.1.8 Explain why the following evaluates to True even though the two approxima- 
tions to 7T differ by more than 10 -16 : 

In [x]: np.isclose(3.1415926535897932, 3.141592653589793, atol=l.e-16, rtol=0) 

Out[x]: True 

whereas this statement works as expected: 

In [x]: np.isclose(3.14159265358979, 3.1415926535897, atol=l.e-14, rtol=0) 

Out[x]: False 


Q6.1.9 Verify that the magic square created in Example E6.2 satisfies the conditions 
that it contains the numbers 1 to N 2 and that its rows, columns and main diagonals sum 
to N(N 2 + l)/2. 

Q6.1.10 Write a one-line statement that retums True if an array is a monotonically 
increasing sequence or False otherwise. 

Hint: np.diff retums the difference between consecutive elements of a sequence. 
For example, 

In [x] : np.diff([1,2,3,3,2] ) 

Out[x]: array([ 1, 1, 0, -1]) 

0 Q6.1.11 (Harder) The dtype np.uints represents an unsigned integer in 8 bits. Its 

value may therefore be in the range 0—255. Explain the following behavior: 

In [x]: x = np.uint8(250) 

In [x]: x*2 
Out[x]: 500 

In [x]: x = np.array([250,], dtype=np.uint8) 

In [x]: x*2 

Out[x]: array([244], dtype=uint8) 
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Problems 

P6.1.1 Turn the following data conceming various species of cetacean into a NumPy 
structured array and order it by (a) mass and (b) population. Determine in each case the 
index at which Bryde’s whale (population: 100000, mass: 25 tonnes) should be inserted 
to keep the array ordered. 


Name 

Population 

Mass/tonnes 

Bowhead whale 

9000 

60 

Blue whale 

20000 

120 

Fin whale 

100000 

70 

Flumpback whale 

80000 

30 

Gray whale 

26000 

35 

Atlantic white-sided dolphin 

250000 

0.235 

Pacific white-sided dolphin 

1000000 

0.15 

Killer whale 

100000 

4.5 

Narwhal 

25000 

1.5 

Beluga 

100000 

1.5 

Sperm whale 

2000000 

50 

Baiji 

13 

0.13 

North Atlantic right whale 

300 

75 

North Pacific right whale 

200 

80 

Southern right whale 

7000 

70 


P6.1.2 The shoelace algorithm for calculating the area of a simple polygon (that is, 
one without holes or self-intersections) proceeds as follows: Write down the (x,y) 
coordinates of the N vertices in an N x 2 array and then repeat the coordinates of 
the first vertex as the last row to make an (N + 1) x 2 array. Now (a) multiply each 
x-coordinate value in the first N rows by the y-coordinate value in the next row down 
and take the sum, Si = x i v ’2 + X 2 .V 3 + • ■ • + x,v>' 1 . Then (b) multiply each y-coordinate 
value in the first N rows by the x-coordinate in the next row down and take the sum, 
S 2 = y 1 x 2 + y 2*3 + ■ • ■ + y,vx 1 . The area of the polygon is then j|5i — S 2 \. 


XI 

.71 

X| x 

/71 


^72 

X 2 <^ 

>72 

73 \ 

\y 3 

73 

>73 

X4 V 

\y 4 

X 4 / 

>74 

Xl 

\vi 

Xl/ 

\V1 


(a) (b) 


Implement this algorithm as a function that takes a NumPy array of vertices as its 
argument and returns the area of the polygon. Do not use Python loops! 

P6.1.3 Using NumPy, it is possible to do this exercise without using a single (Python) 
loop. 
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The normalized Gaussian function with mean /i and Standard deviation a is 


g(x) = 


1 


'V2tt 


exp 


(x - nY 

2 a 2 


Write a program to calculate and plot the Gaussian functions with fi = 0 and the three 
values o = 0.5,1,1.5. Use a grid of 1,000 points in the interval —10 < x < 10. 

Verify (by direct summation) that the functions are normalized with area 1. 

Finally, calculate the first derivative of these functions on the same grid using the 
first-order Central difference approximation: 


g\x) » 


g(x + h) - g(x - h) 
2 h 


for some suitably chosen, small h. 


6.2 Reading and writing an array to a file 

Scientilic data are frequently read in from a text file, which nray contain conrments, 
missing values and blank lines. Columns of values may be either aligned in a fixed- 
width format or separated by one or more delimiting characters (such as spaces, tabs or 
commas). Furthermore, there may be a descriptive header and even footnotes to the file, 
which make it hard to parse directly using Python’s string methods. 

NumPy provides several functions for reading data from a text file. The simpler 
np. loadtxt handles many common cases; the more sophisticated np. genf romtxt 
allows for better handling of missing values and footers. These are described in the 
following sections. 

6.2.1 np . save and np . load 

There is a platform-independent binary format for saving a NumPy array: 

In [x]: np.save('my-array.npy', a) 

will save the array a to the binary file my-array. npy (the . npy extension is appended 
if it is not provided). The array can then be reloaded using NumPy on any other operat- 
ing system with 

In [x]: a = np.load('my-array.npy') 

(the . npy extension must be provided). 

6.2.2 np. loadtxt 

The method prototype for np. loadtxt is 

np.loadtxt(fname, dtype=<class 'float'>, comments='#', 
delimiter=None, converters=None, skiprows=0, 
usecols=None, unpack=False / ndmin=0) 
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The arguments are as follows: 

• fname: The only required argument, fname, which can be a filename, an open 
file, or a generator returning the lines of data to be parsed. 

• dtype: The data type of the array defaults to f loat but can be set explicitly by 
the dtype argument. In particular this is the place to set up names and types for 
a structured array (see Section 6.1.10). 

• comments: Comments in a file are usually started by some character such as # 
(as with Python) or %. To teli NumPy to ignore the contents of any line following 
this character, use the comments argument - by default it is set to #. 

• dei imiter: The string used to separate columns of data in the file; by default it 
is None, meaning that any amount of whitespace (spaces, tabs) delimits the data. 
To read a comma-separated (csv) file, set delimiter=' , '. 

• converters: An optional dictionary mapping the column index to a function 
converting string values in that column to data (e.g., f loat). 

• skiprows: An integer giving the number of lines at the start of the file to skip 
over before reading the data (e.g., to pass over header lines). Its default is 0 (no 
header). 

• usecols: A sequence of column indexes determining which columns of the file 
to retum as data; by default it is None, meaning all columns will be parsed and 
returned. 

• unpack: By default, the data table is returned in a single array of rows and 
columns reflecting the structure of the file read in. Set unpack=True will trans- 
pose this array so that individual columns can be picked off an assigned to differ¬ 
ent variables. 

• ndmin: The minimum number of dimensions the returned array should have. By 
default, 0 (so a file containing a single number is read in as a scalar), it can be set 
to 1 or 2. 

For example, to read the first, third and fourth columns from the file data. txt into 

three separate one-dimensional arrays: 

coli, col3, col4 = np.loadtxt('data.txt', usecols=(0,2,3), unpack=True) 


Example E6.6 The use of np. loadtxt is best illustrated using an example. Consider 
the following text file of data relating to a (fictional) population of students. This file 
can be downloaded as eg6 -a- student -data. txt from 

# Student data collected on 17 July 2014 

# Researcher: Dr Wicks, University College Newbury 

# The following data relate to N = 20 students. It 

# has been totally made up and so therefore is 100% 

# anonymous. 

Subject Sex DOB Height Weight BP V02max 

(ID) M/F dd/mm/yy m kg mmHg mL.kg-1.min-1 

JW-1 M 19/12/95 1.82 92.4 119/76 39.3 


scipython.com/eg/aac 
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JW-2 

M 

11/1/96 

1.77 

80.9 

114/73 

35.5 

JW-3 

F 

2/10/95 

1.68 

69.7 

124/79 

29.1 

JW-6 

M 

6/7/95 

1.72 

75.5 

110/60 

45.5 

# JW-7 

F 

28/3/96 

1.66 

72.4 

101/68 

- 

JW-9 

F 

11/12/95 

1.78 

82.1 

115/75 

32.3 

JW-10 

F 

7/4/96 

1.60 

- 

- 

30.1 

JW-11 

M 

22/8/95 

1.72 

77.2 

97/63 

48.8 

JW-12 

M 

23/5/96 

1.83 

88 .9 

105/70 

37.7 

JW-14 

F 

12/1/96 

1.56 

56.3 

108/72 

26.0 

JW-15 

F 

1/6/96 

1.64 

65.0 

99/67 

35.7 

JW-16 

M 

10/9/95 

1.63 

73.0 

131/84 

29.9 

JW-17 

M 

17/2/96 

1.67 

89.8 

101/76 

40.2 

JW-18 

M 

31/7/96 

1.66 

75.1 

- 

- 

JW-19 

F 

30/10/95 

1.59 

67.3 

103/69 

33.5 

JW-22 

F 

9/3/96 

1.70 

- 

119/80 

30.9 

JW-23 

M 

15/5/95 

1.97 

89.2 

124/82 

- 

JW-24 

F 

1/12/95 

1.66 

63.8 

100/78 

- 

JW-25 

F 

25/10/95 

1.63 

64.4 

- 

28.0 

JW-2 6 

M 

17/4/96 

1.69 

- 

121/82 

39 . 


Let’s find the average heights of the male and female students. The columns we need 
are the second and fourth, and there’s no missing data in these columns so we can use 
np. loadtxt. First construet a record dtype for the two lields, then read the relevant 
columns after skipping the first nine header lines: 

In [x]: fname = 'eg6-a-student-data.txt' 

In [x]: dtypel = np.dtype([('gender 7 , '|S1')/ ('height 7 , 7 f8 7 )]) 

In [x]: a = np.loadtxt(fname, dtype=dtypel, skiprows=9, usecols=(1,3)) 

In [x] : a 
Out[x]: 

array ( [ (b'M' , 1.8200000524520874), (b'M 7 , 1.7699999809265137), 

(b'F 7 , 1.6799999475479126), (b'M 7 , 1.7200000286102295), 

(b 7 M 7 , 1.690000057220459)], 
dtype= [ ( 7 gender 7 , 7 S1 7 ), ('height 7 , 7 <f8 7 )]) 

To lind the average heights of the male students, we only want to index the records 
with the gender field as M, for which we can create a boolean array: 

In [x]: m = a['gender 7 ] == b 7 M 7 
In [x] : m 

Out[x] : array( [ True, True, False, True, . . . , True], dtype=bool) 


m has entries that are True or False for each of the 19 valid records (one is commented 
out) according to whether the student is male or female. So the heights of the male 
students can be seen to be: 

In [x] : print (a [' height' ] [m] ) 

[ 1.82000005 1.76999998 1.72000003 1.72000003 1.83000004 1.63 
1.66999996 1.65999997 1.97000003 1.69000006] 

Therefore, the averages we need are 

In [x] : m_av = a ['height 7 ] [m] .mean() 

O In [x] : f_av = a ['height'] [~m] .mean() 

In [x] : print('Male average: {:.2f} m, Female average: { : . 2 f} m'.format(m_av,f_av)) 

Male average: 1.75 m, Female average: 1.65 m 
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O Note that ~m (“not m”) is the inverse boolean array of m. 

To perform the same analysis on the student weights we have a bit more work to do 
because there are some missing values (denoted by ‘ - ’). We could use np. genf romtxt 
(see Section 6.2.3), but let’s write a converter nrethod instead. We’11 replace the missing 
values with the nicely unphysical value of —99. The function parse weight expects 
a string argument and retums a f loat: 

def parse_weight(s): 

try: 

return float(s) 
except ValueError: 
return -99. 

This is the function we want to pass as a converter for column 4: 

In [x]: dtype2 = np.dtype([('gender', '|S1'), ('weight 7 , 'f8')]) 

In [x]: b = np.loadtxt(fname, dtype=dtype2, skiprows=9, usecols=(1,4), 

converters={4: parse_weight}) 

Now mask off the invalid data and index the array with a boolean array as before: 

In [x]: mv = b['weight'] >0 # elements only True for valid data 

In [x]: m_wav = b['weight'][mv & m].mean() # valid and male 

In [x]: f_wav = b['weight'][mv & ~m].mean() # valid and female 

In [x]: print('Male average: {:.2f} kg, 

Female average: { : . 2f} kg' .format(m_wav,f_wav)) 

Male average: 82.44 kg, Female average: 66.94 kg 

Finally, let’s read in the blood pressure data. Here we have a problem, because the 
systolic and diastolic pressures are not separated by whitespace but by a forward slash 
(/). One solution is to reformat each line to replace the slash with a space before it is 
fed to np. loadtxt. Recall that fname can be a generator instead of a filename or open 
file: we write a suitable generator function, reformat lines, which takes an open file 
object and yields its lines to np. loadtxt, one by one, after the replacement. This is 
going to mess with the column numbering because it has the side effect of splitting up 
the birth dates into three columns, so in our reformatted lines the blood pressure values 
are now in the columns indexed at 7 and 8. 

Listing 6.4 Reading the blood pressure column 


# eg6-a-read-bp.py 
import numpy as np 


fname = 'eg6-a-student-data.txt' 

dtype3 = np.dtype([('gender', '|S1'), ('bps', 'f8'), ('bpd', 'f8 ')]) 

def parse_bp(s): 

try: 

return float(s) 
except ValueError: 
return -99. 


def reformat_lines(fi): 
for line in fi: 

line = line.replace('/',' ') 
yield line 
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with open(fname) as fi: 

gender, bps, bpd = np.loadtxt(reformat_lines(fi), dtype3, skiprows=9, 
usecols=(1,7,8),converters={7: parse_bp, 8: parse_bp}, 
unpack=True) 

# now do something with the data ... 


6.2.3 np . genf romtxt 

NumPy’s genf romtxt function is similar to np. loadtxt but has a few more options 
and is able to cope with missing data. 

The following arguments to this function are the same as for np. loadtxt: fname 
(the only required argument), dtype, comments, converters, usecols and unpack. 


Headers and footers 

Instead of np. loadtxt’s skiprows, the np. genf romtxt function has two optional 
arguments, skip_header and skip_footer, giving the number of lines to skip at the 
beginning and the end of the file, respectively. 

Fixed-width fields 

The delimiter argument works the same as for np. loadtxt but can also be provided 
as a sequence of integers giving the widths of each field to be read in where the data 
does not have delimiters. For example, suppose the following text file, data. txt, is to 
be interpreted as consisting of four columns with widths 2, 1, 9 and 3 characters: 

12 100.231.03 

11 1201.842.04 
11 99.324.02 

so that the first row is to be split: ' l' , ' 2', ' 100 . 231 ', '. 03 '. There is no 

delimiter character, so this isn’t possible with np. loadtxt, but with np . genf romtxt: 

In [x] : np.genfromtxt(fname='data.txt',delimiter=[2,1,9,3] , 
dtype='i4, i4, f8, f8') 

array( [ (1, 2, 100.231, 0.03), (1, 1, 1201.842, 0.04), (1, 1, 99.324, 0.02)], 

dtype= [ ('f0', '<i4'), ('fl', '<i4'), ('f2', '<f8'), ('f3', , <f8 / )]) 

as required. 

Missing data 

If a data set is incomplete, np. loadtxt will be unable to parse the fields with missing 
data into valid values for the array and will raise an exception. np. genf romtxt, 
however, sets missing or invalid entries equal to the default values given in 
Table 6.5. 

For example, the comma-separated file here has two ways of indicating missing data: 
empty fields and entries with ‘ ? ? ? ’: 
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Table 6.5 Default filling values for missing 
data used by genf romtxt 


Data type 

Default value 

int 

-1 

f loat 

np.nan 

bool 

False 

complex 

np.nan + 0 . j 


10.1,4,-0.1,2 
10.2,4,,0 
10.3,???,,4 
10.4,2,0., 

10 . 5 ,- 1,???,3 

Accordingly, np. genf romtxt sets the missing fields to its defaults: 

In [x]: data = np.genfromtxt(fname='data.txt', dtype='f8, i4, f8, i4', 

. . . : delimiter=',') 

In [x]: print(data) 

[(10.1, 4, -0.1, 2) (10.2, 4, nan, 0) (10.3, -1, nan, 4) (10.4, 2, 0.0, -1) 

(10.5, -1, nan, 3)] 

The missing_values and filling .values arguments allow closer control 
over which default values to use for which columns. If missing_values is given 
as a sequence of strings, each string is associated with a column in the data file, 
in order; if given as a dictionary of string values, the keys denote either column 
indexes (if they are integers) or column names (if they are strings). The corresponding 
argument, filling__values, maps these column indexes or names to default values. 
If f illing_values is provided as a single value, this value is used for missing data 
in all columns. 

For example, to replace the invalid values in column 1 (indicated by '???') with 
999, the missing or invalid values in column 2 (also indicated by '???') with —99 and 
the missing values in column 3 with 0: 

In [x]: data =np.genfromtxt(fname='data.txt', dtype='f8, i4, f8, i4', 

...: delimiter=',', missing_values={1: '???', 2: '???'}, 

...: filling_values={1: 999, 2: -99., 3: 0}) 

In [x]: print(data) 

[(10.1, 4, -0.1, 2) (10.2, 4, -99.0, 0) (10.3, 999, -99.0, 4) 

(10.4, 2, 0.0, 0) (10.5, -1, -99.0, 3)] 

Note in particular how the missing entry in the second column has been replaced by 
9 9 9 instead of the default -1 - this would be particularly important if -1 is a valid value 
for this column (however, it is now up to the rest of your code to recognize and know 
what to do with values such as 9 9 9. 15 


15 For more advanced handling of missing values, see the genf romtxt documentation for details on the 
usemask argument and maskecl arrctys in general. 
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Column names 

The argument names provides a way of setting names for the columns of data read in. If 
it is the boolean value True, the names are read from the first valid line after the number 
of lines skipped over specilied by the skip_header argument; if names is a comma- 
separated string of names or a sequence of strings, those strings will be used as names. 
By default, names is None and the field names are taken from the dtype, if given. 


Example E6.7 In an experiment to investigate the Stroop effect, a group of students 
were timed reading out 25 randomly ordered color names, first in black ink and then in 
a color other than the one they name (e.g., the word “red” in blue ink). The results are 
presented in the text file. Missing data are indicated by the character x. 

Subject Number, Gender, Time (words in black), Time (words in color) 

1, F,18.72,31.11 

2, F,21.14,52.47 

3, F,19.38,33.92 

4, M,22.03,50.57 

5, M,21.41,29.63 

6, M,15.18,24.86 

7, F,14.13,33.63 

8, F,19.91,42.39 

9, F,X,43.60 

10, F,26.56,42.31 

11, F,19.73,49.36 

12, M,18.47,31.67 

13, M,21.38,47.28 

14, M,26.05,45.07 

15, F,X,X 

16, F,15.77,38.36 

17, F,15.38,33.07 

18, M,17.06,37.94 

19, M,19.53,X 

20, M,23.29,49.60 

21, M,21.30,45.56 

22, M,17.12,42.99 

23, F,21.85,51.40 

24, M,18.15,36.95 

25, M,33.21,61.59 

We can read in this data with np. genf romtxt and summarize the results with the 
code here. 

Listing 6.5 Analyzing data from a Stroop effect experiment 

# eg6-stroop.py 

import numpy as np 


# Read in the data from stroop.txt, identifying missing values and 

# replacing them with NaN 

O data = np.genfromtxt('stroop.txt', skip_header=l, 

dtype=[('student','u8'), ('gender','SI'), 

('black','f8'), ('color','f8')], 

delimiter=', ' , 
missing_values='X') 
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nwords = 25 

# Remove invalid rows from data set 

0 filtered_data = data[np.isfinite(data['black']) & np.isfinite(data['color' ])] 

# Extract rows by gender (M/F) and word color (black/color) and normalize 

# to time taken per word 

fb = filtered_data['black'][filtered_data['gender']==b'F'] / nwords 
mb = filtered_data['black'][filtered_data['gender']==b'M'] / nwords 

fc = filtered_data['color'][filtered_data['gender']==b'F'] / nwords 
mc = filtered_data['color'][filtered_data['gender']==b'M'] / nwords 

# Produce statistics: mean and Standard deviation by gender and word color 

mu_fb, sig_fb = np.mean(fb), np.std(fb) 

mu_fc, sig_fc = np.mean(fc), np.std(fc) 

mu_mb, sig_mb = np.mean(mb), np.std(mb) 

mu_mc, sig_mc = np.mean(mc), np.std(mc) 

print('Mean and (Standard deviation) times per word (sec)') 
print('gender | black | color | difference') 

print(' F | { : 4.3f} ({:4.3f}) | {:4.3f} ({:4.3f}) | {:4.3f}' 

.format(mu_fb, sig_fb, mu_fc, sig_fc, mu_fc - mu_fb)) 
print(' M | {:4.3f} ( { : 4.3 f}) | {:4.3f} ({:4.3f}) | {:4.3f}' 

.format(mu_mb, sig_mb, mu_mc, sig_mc, mu_mc - mu_mb)) 


O In the absence of any provided f illing_values, np.genfromtxt will replace 
the invalid fields with np. nan. 

© We only want to consider students with times for both parts of the experiment, so 
create a filtered data set here. 

The output shows a significantly slower per-word speed for the false-colored words 
than for the words in black: 

Mean and (Standard deviation) times per word (sec) 
gender | black | color | difference 

F | 0.770 (0.137) | 1.632 (0.306) | 0.862 

M | 0.849 (0.186) | 1.679 (0.394) | 0.830 


6.2.4 Exercises 

Problems 

P6.2.1 The following text file gives some data concerning the 8,000 m peaks, in alpha- 
betical order. 

ex6-2-b-mountain-data.txt This file contains a list of the 14 
highest mountains in the world with their names, height, year 
of first ascent, year of first winter ascent, and location as 
longitude and latitude in degrees (d), minutes (m) and seconds 
(s). Note: as of 2013, no winter ascent has been made of K2 or 
Nanga Parbat. 
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Name 

Height 

m 

First ascent 

date 

First winter 

ascent date 

Location 

(WGS84) 


Annapurna I 

8091 

3/6/1950 

3/2/1987 

28d35m46sN 

83d49ml3sE 

Broad Peak 

8051 

9/6/1957 

5/3/2013 

35d48m39sN 

76d34m06sE 

Cho Oyu 

8201 

19/10/1954 

12/2/1985 

28d05m39sN 

86d39m39sE 

Dhaulagiri I 

8167 

13/5/1960 

21/1/1985 

27d59ml7sN 

86d55m31sE 

Everest 

8848 

29/5/1953 

17/2/1980 

27d59ml7sN 

86d55m31sE 

Gasherbrum I 

8080 

5/7/1958 

9/3/2012 

35d43m28sN 

76d41m47sE 

Gasherbrum II 

8034 

7/7/1956 

2/2/2011 

35d45m30sN 

76d39ml2sE 

K2 

8611 

31/7/1954 

- 

35d52m57sN 

76d30m48sE 

Kangchenj unga 

8568 

25/5/1955 

11/1/1986 

27d42m09sN 

88d08m54sE 

Lhotse 

8516 

18/5/1956 

31/12/1988 

27d57m42sN 

86d56m00sE 

Makalu 

8485 

15/5/1955 

9/2/2009 

27d53m21sN 

87d05ml9sE 

Manaslu 

8163 

9/5/1956 

12/1/1984 

28d33m0sN 

84d33m35sE 

Nanga Parbat 

8126 

3/7/1953 

- 

35dl4ml5sN 

74d35m21sE 

Shishapangma 

8027 

2/5/1964 

14/1/2005 

28d21m8sN 

85d46m47sE 


Use NumPy’s loadtxt method to read these data into a suitable structured array to 
determine the following: 

1. The lowest 8,000 m peak 

2. The most northely, easterly, southerly and westerly peaks 

3. The most recent first ascent of the peaks 

4. The first of the peaks to be climbed in winter 

Also produce another structured array containing a list of mountains with their height 
in feet and first ascent date, ordered by increasing height. 16 

P6.2.2 The file busiest_airports.txt, available to download from 
, provides details of the 30 busiest airports in the world in 2014. 
The tab-delimited fields are: three-letter IATA code, airport name, airport location, 
latitude and longitude (both in degrees). 

Write a program to determine the distance between two airports identified by their 
three-letter IATA code, using the Haversine formula (see, for example, Exercise 4.4.2) 
and assuming a spherical Earth of radius 6378.1 km. 


scipython.com/ex/afa 


P6.2.3 The World Bank provides an extensive collection of data sets on a wide 
range of “indicators,” which is searchable at http://data.worldbank.org/. Data sets 
conceming child immunization rates for BCG (against tuberculosis), Pol3 (Polio) and 
measles in three South-East Asian countries between 1960 and 2013 are available at 
scipython.com/ex/afb . Fields are delimited by semicolons and missing values are 


indicated by ' . .'. 

Use NumPy methods to read in this data and create three plots (one for each vaccine) 
comparing immunization rates in the three countries. 


16 1 metre = 3.2808399 feet. 
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6.3 Statistical methods 

NumPy provides several methods for performing statistical analysis, either on an entire 
array or an axis of it. 


6.3.1 Ordering statistics 

Maxima and minima 

We have already used np. min and np. max to find the minimum and maximum values 
of an array (these methods are also available using the names np . amin and np. amax). 
If the array contains one or more NaN values, the corresponding minimum or maximum 
value will be np. nan. To ignore NaN values instead, use np. nanmin and np. nanmax: 

In [x]: a = np.sqrt(np.linspace(-2, 2, 4)) 


In [x] 

print(a) 



[ 

nan 

nan 

0. 

In [x] 

np.min(a), np 

.max 

(a) 

Out [x] 

(nan, nan) 



In [x] 

np.nanmin(a), 

np. 

nanmax(a 

(0.0, 

.4142135623730951) 



We have also met the functions np. argmin and np. argmax, which return the index 
of the minimum and maximum values in an array; they too have np. nanargmin and 
np. nanargmax variants: 

In [x]: np.argmin(a), np.argmax(a) 

Out[x]: (0, 0) # The first nan in the array 

In [x]: np.nanargmin(a), np.nanargmax(a) 

Out[x]: (2, 4) # The indexes of 0, 1.41421356 

The related methods, np . f min / np. f max and np. minimum / np. maximum, com¬ 
pare two arrays, element by element and retum another array of the same shape. The 
first pair of methods ignores NaN values and the second pair propagates them into the 
output array. For example, 

In [x]: np.fmin([l, -5, 6, 2], [0, np.nan, -1, -1]) 

array([ 0., -5., -1., -1.]) # NaNs are ignored 

In [x]: np.maximum([1, -5, 6, 2], [0, np.nan, -1, -1]) 
array([ 1., nan, 6., 2.]) # NaNs are propagated 


Percentiles 

The np. percentile method retums a specified percentile, q, of the data along an axis 
(or along a flattened version of the array if no axis is given). The minimum of an array is 
the value at q=o (Oth percentile), the maximum is the value at q=loo (lOOth percentile) 
and the median is the value at q=5 o (50th percentile). Where no single value in the array 
corresponds to the requested value of q exactly, a weighted average of the two nearest 
values is used. For example, 

In [x]: a = np.array([[0., 0.6, 1.2], [1.8, 2.4, 3.0]]) 

In [x]: np.percentile(a, 50) 

1.5 
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In [x]: np.percentile(a, 75) 

2.25 

In [x]: np.percentile(a, 50, axis=l) 
array([ 0.6, 2.4]) 

In [x]: np.percentile(a, 75, axis=l) 
array([ 0.9, 2.7]) 


6.3.2 Averages, variances and correlations 

Averages 


In addition to np.mean, which calculates the arithmedc mean of the values along 
a specified axis of an array, NumPy provides methods for calculating the weighted 
average, median, Standard deviation and variance. The weighted average is calculated as 



where the weights, vty, are supplied as a sequence the same length as the array. For 
example, 

In [x]: x = np.array([1., 4., 9., 16.]) 

In [x]: np.mean(x) 

7.5 

In [x]: np.median(x) 

6.5 

In [x]: np.average(x, weights=[0., 3., 1., 0.]) 

5.25 # ie (3. *4. + 1. *9.) / (3. + 1.) 

If you want the sum of the weights as well as the weighted average, set the returned 
argument to True. In the following example, we do this and find the weighted averages 
in each row (axis=l averages values across columns of a two-dimensional array): 

In [x]: x = np.array( [[1., 8., 27], [-0.5, 1., 0.]] ) 

In [x]: av, sw = np.average(x, weights=[0., 1., 0.1], axis=l, returned=True) 

In [x]: print(av) 

[ 9.72727273 0.90909091] 

In [x]: print(sw) 

[ 1.1 1.1] 


The averages are therefore (1 x 8 + 0.1 x 27)/1.1 = 9.72727273 and (1 x 1.)/1.1 = 
0.90909091 where 1.1 is the sum of the weights. 

Standard deviations and variances 

The function np. std calculates, by default, the uncorrected sample Standard deviation: 



where x,- are the N observed values in the array and x is their mean. To calculate the 
corrected sample Standard deviation, 
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a 


1 


N 


N-S 


N 


-x) 2 , 


pass to the argument ddof the value of 8 such that N — 8 is the number of degrees of 
freedom in the sample. For example, if the sample values are drawn from the population 
independently with replacement and used to calculate x there are N — 1 degrees of 
freedom in the vector of residuals used to calculate a: (x\ — x,X 2 — x, ■ ■ ■ , x,y — x) and 
so 8 — 1. For example, 


In [x]: x = np.array([1., 2., 3., 4.]) 

In [x]: np.std(x) # or x.std(), uncorrected Standard deviation 

1 .1180339887498949 

In [x]: np.std(x, ddof=l) # corrected Standard deviation 
1.2909944487358056 


The function np. nanstd calculates the Standard deviation ignoring np. nan values 
(so that N is the number of non-NaN values in the array). NumPy also has methods for 
calculating the variance of the values in an array: np. var and np. nanvar. 

The covariance is returned by the np. cov method. In its simplest invocation, it can 
be passed a single two-dimensional array, x, in which the rows represent variables, x,, 
and the columns observations of the value of each variable. np. cov (x) then returns the 
covariance matrix, Cy, indicating how variable x,- varies with xf. the element Cy is said 
to be an estimate of the covariance of variables Xj and xf. 

Qj = co vfaxj) = E[(xi - fii)(xj - fij)\ 

where /x, is the mean of the variable x/ and E[ ] denotes the expected value. If there are 
N observed values for each of the variables, /x,- = 1 ^.x,-*. The unbiased estimate of 
the covariance is then 

1 

Cij — — - / y (xjk /x i ) (xjk jij) \ 

k 

This is the default behavior of np. cov, but if the bias argument is set to 1, then N is 
used in the denominator here to give the biased estimate of the covariance. Finally, the 
denominator can be set explicitly to N — 8 by passing 8 as the argument to the ddof 
argument of cov. 


Example E6.8 As an example, consider the matrix of five observations each of three 
variables, xq, x\ and X2 whose observed values are held in the three rows of the array x: 


X = np.array([ [0.1, 

0.3, 

0.4, 

0.8, 

0.9] 

[3.2, 

2.4, 

2.4, 

0.1, 

5.5] 

[10. , 

8.2, 

4.3, 

2.6, 

0.9] 


]) 


The covariance matrix is a 3 x 3 array of values, 

In [x] : print ( np.cov(X) ) 

[[ 0.115 , 0.0575, -1.2325], 

[ 0.0575, 3.757 , -0.8775], 

[ -1.2325, -0.8775, 14.525 ]] 
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The diagonal elements, Ca, are the variances in the variables x/ assuming N — 1 
degrees of freedom: 

In [x]: print(np.var(X, axis=l, ddof=l)) 

[ 0.115 3.757 14.525] 

Although the magnitude of the covariance matrix elements is not always easy to 
interpret (because it depends on the magnitude of the individual observations which may 
be very different for different variables), it is ciear that there is a strong anticorrelation 
between xq and X 2 (C 02 = —1.2325: as one increases the other decreases) and no 
strong correlation between xq and x\ (Coi = 0.0575: xo and xi do not trend strongly 
together). 


The correlation coefficient matrix is often used in preference to the covariance matrix 
as it is normalized by dividing Cy by the product of the variables’ Standard deviations: 


Pii = corr (x;, Xj) = 


Ca 


Ch 


OiCJj 


Vciicjj 


This means that the elements Pjj have values between —1 and 1 inclusive, and the 
diagonal elements, Pu = 1. In our example, using np. corrcoef gives: 


In [x]: print( np.corrcoef(X) ) 

[[ 1. 0.0874779 -0.95363007] 

[ 0.0874779 1. -0.11878687] 

[-0.95363007 -0.11878687 1. ]] 


It is easy to see from this correlation coefficient matrix the strong anticorrelation 
between xo and X 2 (C 0.2 = —0.954) and the lack of correlation between x\ and the other 
variables (e.g., Cj ,0 = 0.087). 

Both the np.cov and np. corrcoef methods can take a second array-like object 
containing a further set of variables and observations, so they can be called on a pair of 
one-dimensional arrays without stacking them into a single matrix: 


In [x] 

x = np.array([1., 2 . , 3 . , 

In [x] 

y = np.array([0.08, 0.31, 

In [x] 

print( np.corrcoef(x,y) ) 

[[ 1. 

0.97787645] 


[ 0.97787645 1 . ] ] 


0.62]) 


That is 

np.corrcoef(x, y) 

is a convenient alternative to 

np.corrcoef(np.vstack((x,y))) 


Finally, if your observations happen to be in the rows of your matrix, with the vari¬ 
ables corresponding to the columns (instead of the other way round) there is no need 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:20, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.101 7/CB09781 1 39871 754.006 







6.3 Statistical methods 


229 


to transpose the matrix, just pass rowvar=0 to either np. cov or np . corrcoef and 
NumPy will take care of it for you. 


Example E6.9 The Cambridge University Digital Technology Group have been record- 
ing the weather from the roof of their department building since 1995 and make the data 
available to download in a single CSV file at www.cl.cam.ac.uk/research/dtg/weather/. 

The following program determines the correlation coefficient between pressure and 
temperature at this site. 

Listing 6.6 Calculating the correlation coefficient between air temperature and pressure 

# eg6-pT.py 
import numpy as np 
import pylab 

data = np.genfromtxt('weather-raw.csv', delimiter= / ,', usecols=(1,4)) 

# Remove any rows with either missing T or missing p 
data = data[~np.any(np.isnan(data), axis=l)] 

# Temperatures are reported after multiplication by a factor of 10 so remove 

# t his factor 
data[:,0] /= 10 

# Get the correlation coefficient 
corr = np.corrcoef(data, rowvar=0)[0,1] 

print('p-T correlation coefficient: {:.4f}'.format(corr)) 

# Piot the data on a scatterplot: T on x-axis, p on y-axis. 
pylab.scatter(*data.T, marker='.') 

pylab.xlabel('$T$ /$\mathrm{ A \circ C}$') 
pylab.ylabel('$p$ /mbar') 
pylab.show() 


The output (Figure 6.4) gives a correlation coefficient of 0.0260: as expected, there 
is little correlation between air temperature and pressure (since the air density also 
varies). 


6.3.3 Histograms 

The NumPy function, np. histogram, creates a histogram from the values in an array. 
That is, a set of bins is defined with lower and upper limits and each is filled with the 
number of elements from the array whose value falis within its limits. For example, 
suppose the following array holds the percentage marks of 10 students in a test: 

In [x]: marks = np.array([45, 68, 56, 23, 60, 87, 75, 59, 63, 72]) 

There are several ways to deline the histogram bins. If the bins argument is a sequence, 
it defines the boundaries of the sequential bins: 

In [x] : bins = [20, 40, 60, 80, 100] 
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Figure 6.4 There is virtually no correlation between air temperature and air pressure in this data 
set. 


defines four bins with ranges [20 - 40%), [40 - 60%), [60 - 80%) and [80 - 100%]. 
All but the last bin is half open; that is, the first bin includes marks from and including 
20% up to but not including 40%. Note that a sequence of N + 1 numbers is required 
to create N bins. The np . histogram method retums a tuple consisting of the values of 
the histogram and the bin edges we delined (both as NumPy arrays). 


In [x] 

hist, bins = 

np 

histogram(marks, bins) 

In [x] 

hist 



Out [x] 

array([1, 3, 

5, 

1] ) 

In [x] 

bins 



Out [x] 

array( [ 20, 

40 

60, 80, 100]) 


This shows that there is one mark in the 20 - 40% bin, three in the 40 - 60% bin and 
so on. 

If you just want a certain number of evenly spaced bins, an integer can be passed as 
bins instead of a sequence: 

In [x]: np.histogram(marks, bins=5) 

Out [x] : (array ( [1, 1, 2, 4, 2]), 

array( [ 23. , 35.8, 48.6, 61.4, 74.2, 87. ])) 

By default, the requested number of bins range between the minimum and maximum 
values of the array (here, 2 3 and 8 7); to specify a different minimum and maximum, set 
the range argument tuple: 

In (x]: np.histogram(marks, bins=5, range=(0,100)) 

Out [x] : (array ([0, 1, 3, 5, 1]), 

array([ 0., 20., 40., 60., 80., 100.])) 

The np.histogram method also has an optional boolean argument density: by 
default it is False, meaning that the histogram array returned contains the number of 
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Mark 


Figure 6.5 An example histogram. 


values from the original array in each bin. If density is set to True, the histogram 
array will contain the probability density function, normalized so that the integral over 
the entire range of the bins is equal to unity: 

In [x]: hist, bins = np.histogram(marks, bins=5, range=(0,100), 

density=True) 

In [x]: print(hist) 

[ 0. 0.005 0.015 0.025 0.005] 

In [x]: bin_width = 100/5 

In [x]: print(np.sum(hist) * bin_width) 

1.0 

(By integral here we mean the area under the histogram, which is the sum of each 
histogram bar height times its corresponding bin width.) 

To plot a histogram with pylab, use pylab.hist, passing it the same arguments 
you would to np. histogram: 17 


O In [x] : 

hist, 

bins, patches 

= pylab.hist(marks, bins=5, range=(0,100) 

In [x] : 

hist, 

bins 


Out [x] : 




(array( 

[ 0 ., 

1., 3., 5. # 

1.] ) , 

array( 

[ 0 . 

20., 40. 

60., 80., 100.])) 


In [x]: pylab.show() 


O In addition to the bin counts (hist) and boundaries (bins), pylab returns a list 
of references to the “patches” which appear in the plotted figure (see Section 7.1.5 for 
more information about this advanced feature). 

The resulting histogram is plotted in Figure 6.5. See also Sections 3.3.2 and 7.1.2. 


17 Note that the density argument is not supported as of Matplotlib 1.3.1: instead, set normed=True for a 
probability density plot. 
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6.3.4 

Exercises 

Problems 

P6.3.1 A certain lottery involves players selecting six numbers without replacement 
from the range [1,49]. The jackpot is shared among the players who match ali six 
numbers (“balls”) selected in the same way at randorn in a twice-weekly draw (in any 
order). If no player matches every drawn number, the jackpot “rolls over” and is added 
to the following draw’s jackpot. 

Although the lottery is fair in the sense that every combination of drawn numbers 
is equally likely, it has been observed that many players show a preference in their 
selection for certain numbers, such as those that represent dates (i.e., more of their 
numbers are chosen from [1,31] than would be expected if they chose randomly). Hence, 
to avoid sharing the jackpot and hence to maximize one’s expected winnings, it would 
be reasonable to avoid these numbers. 

Test this hypothesis by establishing if there is any correlation between the number 
of balls with values less than 13 (representing a month) and the jackpot winnings 
per person. Ignore draws immediately following a rollover. The necessary data can be 
downloaded from scipython.com/ex/afe . 

P6.3.2 We have seen how to create a histogram plot from an array with pylab. hi st, 
but suppose you have already created arrays hist and bins using np. hist and want 
to plot the resulting histogram from these arrays. You can’t use pylab. hi st because 
this function expects to act on the original array of data. Use pylab.bar 18 to plot a 
hist array as a bar chart. 

P6.3.3 The heights, in cm, of a sample of 1,000 adult men and 1,000 adult women 
from a certain population are collected in the data files ex6 - 3 - f-male-heights . txt 
and ex6-3-f-female-heights . txt available at scipython.com/ex/afd . Readin the 
data and establish the mean and Standard deviation for each sex. Create histograms for 
the two data sets using a suitable binning interval and plot them on the same figure. 

Repeat the exercise in imperial units (feet and inches). 

6.4 

Polynomials 

NumPy provides a powerful set of classes for representing polynomials, including meth- 
ods for evaluation, algebra, root-finding and fitting of several kinds of polynomial basis 
functions. In this section, the simplest and most familiar basis, the power series, will be 
described first, before a discussion of a few other classical orthogonal polynomial basis 
functions. 


18 Documentation for this method is at http://matplotlib.org/api/pyplot_api.html/matplotlib.pyplot.bar; see 
also Section 7.1.2. 
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6.4.1 Defining and evaluating a polynomial 


A (finite) polynomial power series has as its basis the powers of x: 1 (= x°),x,x 2 ,x 3 , ■ ■ ■ , 
x N , with coefficients c t : 


N 



This section describes the use of the Polynomial convenience class which pro¬ 
vides a natural interface to the underlying functionality of NumPy’s polynomial 
package. 

The polynomial convenience class is numpy .polynomial. Polynomial. To import 
it directly, use 

In [x]: from numpy.polynomial import Polynomial 

Alternatively, if the whole NumPy library is already imported as np, then rather than 
constantly refer to this class as np .polynomial. Polynomial, it is convenient to 
detine a variable: 

In [x]: import numpy as np 

In [x]: Polynomial = np.polynomial.Polynomial 

This is the way we will refer to the Polynomial class in this book. 

To detine a polynomial object, pass the Polynomial constructor a sequence of coef¬ 
ficients to increasing powers of x, starting with cq. For example, to represent the poly¬ 
nomial 


P(x) = 6 — 5x + x 2 


detine a the object 

In [x]: p = Polynomial([6, -5, 1]) 

You can inspect the coefficients of a Polynomial object with print or by referring to 
its coef attribute. 

In [x]: print(p) 
poly([ 6 . -5. 1.]) 

In [x]; P.coef 

Out[x]: array([ 6., -5., 1-]) 

Notice that the integer coefficients used to detine the polynomial have been automati- 
cally cast to f loat. It is also possible to use complex coefficients. 

To evaluate a polynomial for a given value of x, “call” it as follows: 

In [x]: p(4) # calculate p at a single value of x 

2 .0 

In [x]: x = np.linspace(-5, 5, 11) 

In [x]: print(p(x)) # calculate p on a sequence of x values 

Out[x]: [56. 42. 30. 20. 12. 6. 2. 0. 0. 2. 6.] 
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6.4.2 Polynomial algebra 

The Polynomial convenience class implements the familiar Python operators: 

//, **, % and divmod 19 on Polynomial objects. These are illustrated in the following 
examples using the polynomials 

P{x) = 6 — 5x + x 2 
Q(x ) = 2 — 3x 

In [x] : p = Polynomial( [6, -5, 1]) 

In [x] ; q = Polynomial([2, -3]) 

In [x] : print (p + q) 
poly([ 8. -8. 1.]) 

In [x] : print (p - q) 
poly([4. -2. 1.]) 

In [x] : print (p * q) 

poly([ 12. -28. 17. -3.]) 

In [x] : print (p // q) 

poly{[ 1.44444444 -0.33333333]) 

In [x] : print (p % q) 

poly ( [ 3.11111111]) # i.e. 28/9 

Division of a polynomial by another polynomial is analogous to integer division (and 
uses the same // operator): that is, the resuit is another polynomial (with no reciprocal 
powers of x), possibly leaving a remainder. 

Hence p = q{—\x + ly) + iy and the // operator returns the quotient polynomial, 
— + qy. The remainder (which, in general, will be another polynomial) is retumed, 

as might be expected, by the modulus operator, %. The divmod () built-in returns both 
quotient and remainder in a tuple: 

In [x]: quotient, remainder = divmod(p, q) 

In [x]: print(quotient) 

poly ( [ 1.44444444 -0.33333333]) # i.e. p (x) // q(x) is 13/9 - x/3 

In [x]: print(remainder) 
poly([ 3.11111111]) 

Exponentiation is supported through the * * operator; polynomials can only be raised 
to a non-negative integer power: 

In [x]: print(q ** 2) 
poly([ 4. -12. 9.]) 

It isn’t always convenient to create a new polynomial object in order to use these 
operators on one another, so many of the operators described here also work with 
scalars: 


19 The divmod function returns the quotient and remainder of a division operation as a tuple. 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:20, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.101 7/CB09781 1 39871 754.006 



6.4 Polynomials 


235 


In [x] : print (p * 2) # multiplication by a scalar 

poly([ 12. -10. 2 . ] ) 

In [x]: print(p /2) # division by a scalar 

poly([3. -2.5 0.5]) 

and even tuples, lists and arrays of polynomial coefficients. For example, to multiply 
P(x) by x 2 — 2x 3 : 

In [x]: print(p * [0, 0, 1, -2]) 
poly([ 0. 0. 6. -17. 11. -2.]) 

Finally, one polynomial can be substituted into another. To evaluate P(Q(x)), simply 
use p (q) : 

In [x] : print (p (q) ) 
poly([ 0. 3. 9.]) 

That is, P(Q(x)) = 3x + 9x 2 . 


6.4.3 Root-finding 

The roots of a polynomial are returned by the roots method. Repeated roots are simply 
repeated in the returned array: 

In [x]: p.roots() 
array([ 2., 3.]) 

In [x]: (q*q).roots() 

array([ 0.66666667, 0.66666667]) 

In [x]: Polynomial([5, 4, 1]).roots() 
array([-2.-1.j, -2.+l.j]) 

Polynomials can also be created from their roots with Polynomial. f romroots: 

In [x]: print( Polynomial.fromroots([-4, 2, 1]) ) 
poly([ 8.-10. 1. 1.]) 

That is, (x + 4)(x — 2)(x — 1) = 8 — lOx + x 2 + x 3 . Note that the way the polynomial 
is constructed means that the coefficient of the highest power of x will be 1. 


Example E6.10 The tanks used in the storage of cryogenic liquids and rocket fuel are 
often spherical (why?). Suppose a particular spherical tank has a radius R and is filled 
with a liquid to a height h. It is (fairly) easy to find a formula for the volume of liquid 
from the height: 

V = nRh 2 - jzh 3 . 

3 

Suppose that there is a constant flow of liquid from the tank at a rate F = — dV/At. 
How does the height of liquid, h, vary with time? Differentiating the earlier mentioned 
equation with respect to t leads to 

, d h 

(2j rRh — irh 2 )— = —F. 
d t 
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If we start with a full tank (h = 2 R) at time t = 0, this ordinary differential equation 
may be integrated to yield the equation 



a cubic polynomial in h. Because this equation cannot be inverted analytically for h, 
let’s use NumPy’s Polynomial class to find h(t), given a tank of radius R = 1.5 m 
from which liquid is being drawn at 200 cm 3 s~ 1 . 


The total volume of liquid in the full tank is Vq = ^nR?. Clearly, the tank is empty 


when h = 0, which occurs at time T = Vq/F, since the flow rate is constant. At any 
particular time, t, we can find h by finding the roots of this equation. 

Listing 6.7 Liquid height in a spherical tank 

# eg6-c-spherical-tank-a.py 
import numpy as np 

import pylab 

Polynomial = np.polynomial.Polynomial 

# Radius of the spherical tank in m 
R = 1.5 

# Flow rate out of the tank, m*3.s-l 
F = 2.e-4 

# Total volume of the tank 
V0 = 4/3 * np.pi * R**3 

# Total time taken for the tank to empty 
T = V0 / F 

# coefficients of the quadratic and cubic terms 

# of p(h), the polynomial to be solved for h 
c2, c3 = np.pi * R, -np.pi / 3 

N = 100 

# array of N time points between 0 and T inclusive 
O time = np.linspace(0, T, N) 

# create the corresponding array of heights h(t) 
h = np.zeros(N) 

for i, t in enumerate(time): 


cO = F*t - V0 

p = Polynomial([cO, 0, c2, c3]) 

# find the three roots to this polynomial 


© roots = p.roots() 


# we want the one root for which 0 <= h <= 2R 
h [i] = roots [(0 <= roots) & (roots <= 2*R)] [0] 

pylab.plot(time, h, 'o 7 ) 

pylab.xlabel( 7 Time /s 7 ) 

pylab.ylabel( 7 Height in tank /m 7 ) 

pylab.show() 

O We construet an array of time points between t = 0 and t = T. 

© For each time point find the roots of the above cubic polynomial. Only one of the 
roots is physically meaningful, in that 0 < h < 2R (the height of the level of liquid 
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Figure 6.6 The height of liquid as a function of time, h(t), for the spherical tank problem. 

cannot be negative or greater than the diameter of the tank), so we extract that root (by 
boolean indexing) and store it in the array h. 

Finally, we plot h as a function of time (Figure 6.6). 


6.4.4 Calculus 


Polynomials can be differentiated with the Polynomial .deriv method. By default, 
this function returns the first derivative, but the optional argument m can be set to return 
the mth derivative: 

In [x] : print (p) 

poly([ 6. -5. 1.]) # 6 - 5x + x*2 

In [x] : print (p. deriv () ) 
poly ( [-5 . 2 . ] ) 

In [x] : print (p. deriv (2) ) 
poly([ 2.]) 

A Polynomial object can also be integrated with an optional lower bound, L, and 
constant of integration, k, treated as shown in the following example: 



J 2 — 3x dx = 2x 


|x 2 + k 


By default, L and k are zero, but can be specified by passing the arguments lbnd and k 
to the Polynomial. integ method: 

In [x] : print (q) 
poly([ 2. -3.]) 

In [x] : print (q. integ () ) 
poly( [0. 2. -1.5]) 
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In [x]: print (q.integ(lbnd=l)) 
poly ([-0.5 2. -1.5]) 

In [x]: print (q.integ(k=2)) 
poly ([2. 2. -1.5]) 

Polynomials can be integrated repeatedly by passing a value to m, giving the number of 
integrations to perform. 20 


6.4.5 0 Classical orthogonal polynomials 


In addition to the Polynomial class representing simple power series such as ciq + 
a\x + ci 2 X 2 + • ■ ■ + a n x", NumPy provides classes to represent a series composed of any 
of a number of classical orthogonal polynomials. These polynomials and linear combi- 
nations of them are widely used in physics, statistics and mathematics. As of NumPy 
version 1.8, the polynomial convenience classes provided are Chebyshev, Legendre, 
Laguerre, Hermite (“physicists’ version”) and HertmiteE (“probabilists’ version”). 
Many good textbooks exist describing the properties of these polynomial classes; to 
illustrate their use we will focus here on the Legendre polynomials, 21 denoted P n (x). 
These are the Solutions to Legendre’s differential equation, 


d 

dx 


d 


x z )—P n (x) 

dr 


+ n(n + 1 )P n (x) = 0. 


The first few Legendre polynomials are 


Po(x) = 1 
P\ (x) = X 
Plix) = 2 (3x 2 - 1) 

Pt, (x) = \ (5x 3 — 3x) 

P 4 (x) = ±(35x 4 - 30x 2 + 3) 

and are plotted in Figure 6.7. 

A useful property of the Legendre polynomials is their orthogonality on the interval 

[- 1 , 1 ]: 

2 

I Pn (x)P m (x) dx — — ■ -5 mn 

J- 1 2/7+1 

which is important in their use as a basis for representing suitable functions. 22 

To create a linear combination of Legendre polynomials, pass the coefficients to 
the Legendre constructor, just as for Polynomial. For example, to construet the 
polynomial expansion 57+ (x) + 27L(x): 


20 Different constants of integration for each can be specified by setting k to an array of values. 

21 The Legendre Polynomials are named after the French mathematician Adrien-Marie Legendre (1752- 
1833); for 200 years until 2005 many publications mistakenly used a portrait of the unrelated French 
politician Louis Legendre as that of the mathematician. 

22 In particular, in physics, the multipole expansion of electrostatic potentials. 
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Figure 6.7 The first five Legendre Polynomials, P n (x) for x — 0,1,2,3,4. 


In [x]: Legendre = np.polynomial.Legendre 
In [x]: A = Legendre([0, 5, 2]) 

An existing polynomial object can be converted into a Legendre series with the cast 
method: 

In [x]: P = Polynomial([0,1,1]) 

In [x]: Q = Legendre.cast(P) 

In [x]: print(Q) 

leg([ 0.33333333 1. 0.66666667]) 

That is, x + x 2 = ^Po + P\ + \Pi- 

An instance of a single Legendre polynomial basis function can be created with the 
basis method: 

In [x]: L3 = Legendre.basis(3) 

creates an object representing P 3 (x) , and is equivalent to calling Legendre ( [0,0,0, 
1] ) . To obtain a regular power series, we can cast it back to a Polynomial: 

In [x]: print(Polynomial.cast(L3)) 
poly([0. -1.5 0. 2.5]) 

In addition to the functions just described for Polynomial, including differentiation 
and integration of polynomial series, the convenience classes for the classical orthogo- 
nal polynomials expose several useful methods. 

convert converts between different kinds of polynomials. For example, the linear 
combination A(x) = 5Pi(x) + IPiix) = 5x + 2\(3x 2 — 1) = — 1 + 5x + 3x 2 , as a 
power series of monomials (a Maclaurin series), is represented by an instance of the 
Polynomial class as: 

In [x]: A = Legendre([0, 5, 2]) 

In [x]: B = A.convert(kind=Polynomial) 

In [x]: print(B) 

In [x]: poly([-1. 5. 3.]) 
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Because the objects A and B represent the same underlying function (just expanded in 
different basis sets) they evaluate to the same value when given the same x, and have 
the same roots: 

In [x] : A ( - 2) == B ( - 2) 

Out[x] : True 

In [x] : print (A. roots () , B.rootsO, sep='\n') 

[-1.84712709 0.18046042] 

[-1.84712709 0.18046042] 

6.4.6 Fitting polynomials 

A common use of polynomial expansions is in fitting and approximating data series. 
NumPy’s polynomial modules provide methods for the least squares fitting of func- 
tions. The fit function of the polynomial convenience classes is described in this 
section. 2 ’ 

The domain and window attributes 

A typical one-dimensional fitting problem requires the best-fit polynomial to a finite, 
continuous function over some finite region of the x-axis (the domain). However, poly¬ 
nomials themselves can differ from each other wildly and diverge as x —* ±oo. This 
makes any attempt to blindly find the least squares fit on the domain of the function itself 
potentially risky: the fitted polynomial is frequently subject to numerical instability, 
overflow, underflow and other types of ill-conditioning (see Section 9.2). As an example, 
consider the function 


in the interval (100, 100.1). There is nothing particularly tricky about this function: 
it is well-behaved everywhere and/(x) takes very moderate values between e~ l and 
e 1 . Yet a straightforward least squares fit to a fourth-order polynomial on this domain 
gives: 

-11.881851 +2379.22228x- 119.741202x 2 -23828009.7x 3 + 1192894610x 4 

and clearly the potential for numerical instability and loss of accuracy with even moder¬ 
ate values of x: our approximation to/(x) is built up from difference between very large 
monomial terms. 

Each class of polynomial has a default window over which it is optimal to take a 
linear combination in fitting a function. For example, the Legendre polynomials window 
is the region [—1,1] plotted above, on which P n (x) are orthogonal and everywhere 
| P n (x) | < 1. The problem is that it is rather unlikely that the function to be fitted falis 


23 Note: The older np.polyld class representing one-dimensional polynomials is stili available (as of 
NumPy 1.9) for backward-compatibility reasons. It is documented at http://docs.scipy.org/doc/numpy/ 
reference/routines.polynomials.polyld.html and provides a simpler but less reliable least squares fitting 
method, polyf it. It is recommended, however, to use the new Polynomial class in new code. 
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within the chosen polynomials’ window. It is therefore necessary to relate the domain 
of the function to the window. This is done by shifting and scaling the x-axis: that is, by 
mapping points in the function’s domain to points in the fitting polynomials’ window. 
The polynomial fit function does this automatically, so the fourth-order least squares 
fit to the earlier mentioned function yields 

In [x]: x = np.linspace(100, 100.1, 1001) 

In [x]: f = lambda x: np.exp(-np.sin(40*x)) 

In [x]: p = Polynomial.fit(x, f(x), 4) 

In [x]: print(p) 

poly ( [ 1.49422551 -2.54641449 0.63284641 1.84246463 -1.02821956]) 

The domain and window of a polynomial can be inspected as the attributes domain and 
window respectively: 

In [x]: p.domain 
array([ 100. , 100.1]) 

In [x]: p.window 
array([-1., 1.]) 

It is important to note that the argument x is mapped from the domain to the window 
whenever a polynomial is evaluated. This means that two polynomials with different 
domains and/or Windows may evaluate to different values even if they have the same 
coefficients. For example, if we create a Polynomial object from scratch with the same 
coefficients as the fitted polynomial p above: 

In [x]: q = Polynomial([1.49422551, -2.54641449, 0.63284641, 

1.84246463, -1.02821956]) 

it is created with the default domain and window, which are both (-1,1): 

In [x]: print(q.domain, q.window) 

[-1. 1.] [-1. 1.] 

and so evaluating q at 100.05, say, maps 100.05 in the domain to 100.05 in the window 
and gives a very different answer from the evaluation of p at the same point in the 
domain (which maps to 0. in the window): 

In [x] : q (100.05) , p(100.05) 

(-101176442.96772559, 1.4942255113760108) 

It is easy to show that the mapping function from x in a domain (a, b) to x' in a window 
(a ', b') is 

. b' — a' . b' — a' 

x = m(x) = x + where /i =-, x = b — b -. 

b — a b — a 

These are the parameters retumed by the polynomial’s mapparms function: 

In [x]: chi, mu = p.mapparms() 

In [x]: print(chi, mu) 

-2001.0, 20.0 

Therefore, 

In [x]: print(q(chi + mu*100.05)) 

1.49422551 
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It is possible to change domain and window by direct assignment: 

In [x]: q.domain = np.array((100., 100.1)) 

In [x]: print(q(100.05)) 

1.49422551 

To evaluate a polynomial on a set number of evenly distributed points in its domain, 
for example, to plot it, use the Polynomial’s linspace method: 

In [x]: p.linspace(5) 

Out [x] : 

(array([ 100. , 100.025, 100.05 , 100.075, 100.1 ]), 

array([ 1.80280222, 2.63107256, 1.49422551, 0.54527422, 0.39490249])) 

p. linspace retums two arrays with the specified number of samples on the polyno- 
miaTs domain representing the x points and the values the polynomial takes at those 
points, p{x). 

Polynomial.fit 

The Polynomial method fit retums a least squares fitted polynomial to data, y, 
sampled at values x. In its simplest use, fit needs only to be passed array-like objects 
x and y, and a value for deg, the degree of polynomial to fit. It retums the polynomial 
which minimizes the sum of the squared errors, 

E = Y1 1?» i 2 

i 

For example, 

In [x]: x = np.linspace(400, 700, 1000) 

In [x]: y = 1 / x**4 

In [x]: p = Polynomial.fit(x, y, 3) 

produces the best-fit cubic polynomial to the function x~ 4 on the interval (400,700). 

Weighted least-squares fitting is achieved by setting the argument, w, to a sequence 
of weighting values that is the same length as x and y. The polynomial returned is that 
which minimizes the sum of the weighted squared errors, 

E = ^wf\yi -p(Xi)| 2 
i 

The domain and window of the fitted polynomial may be specified with the 
arguments domain and window; by default a minimal domain covering the points x 
is used. 

It is wise to check the quality of the fit before using the returned polynomial. Setting 
the argument full=True causes fit to return two objects: the fitted polynomial and a 
list of various statistics about the fit itself: 

In [x] : deg - 3 

In [x]: p, [resid, rank, sing_val, rcond] = Polynomial.fit(x, y, deg, full=True) 

In [x] : p 
Out [x] : 

Polynomial([ 1.07041864e-ll, -1.16488662e-ll, 1.02545751e-ll, 

-5.64068914e-12], [ 400., 700.], [-1., 1.]) 
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In [x]: resid 

Out[x]: array([ 4.571809720-23]) 

In [x]: rank 
Out [x] : 4 

In [x]: sing_val 

Out[x]: array([ 1.3843828 , 1.32111941, 0.50462215, 0.28893641]) 

In [x]: rcond 

Out[x]: 2.2204460492503131e-13 

This list can be analyzed to see how well the polynomial function fits the data, resid 
is the sum of the squared residuals, 



- a smaller value indicates a better fit. rank and sing_val are the rank and singular 
values of the matrix inverted in the least squares algorithm to find the polynomial 
coefficients: ill-conditioning of this matrix can lead to poor fits (particularly if the fitted 
polynomial degree is too high). rcond is the cutoff ratio for small singular values within 
this matrix: values smaller than this value are set to zero in the fit (to protect the fit 
from spurious artifacts introduced by round-off error) and a RankWarning exception is 
raised. If this happens, the data may be too noisy or not well described by the polynomial 
of the specified degree. Note that least squares fitting should always be carried out at 
double precision and be aware of “over-fitting” the data (attempting to fit a function 
with too many coefficients, i.e., a polynomial of too high order). 


Example E6.11 A straight-line best fit is just a special case of a polynomial least 
squares fit (with deg=l). Consider the following data giving the absorbance over a 
path length of 5 mm of UV light at 280 nm, A, by a protein as a function of the 
concentration, [P]: 


[P] / Mg/mL 

A 

0 

2.287 

20 

3.528 

40 

4.336 

80 

6.909 

120 

8.274 

180 

12.855 

260 

16.085 

400 

24.797 

800 

49.058 

1500 

89.400 


We expect the absorbance to be linearly related to the protein concentration: A = 
m[P] + Ao where Ao is the absorbance in the absence of protein (e.g., due to the solvent 
and experimental components). 
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Listing 6.8 Straight line fit to absorbance data 

# eg6-polyfit.py 
import numpy as np 
import pylab 

Polynomial = np.polynomial.Polynomial 

# The data: conc = [P] and absorbance , A 

conc = np.array([0, 20, 40, 80, 120, 180, 260, 400, 800, 1500]) 

A = np.array([2.287, 3.528, 4.336, 6.909, 8.274, 12.855, 16.085, 24.797, 
49.058, 89.400]) 

cmin, emax = min(conc), max(conc) 

pfit, stats = Polynomial.fit(conc, A, 1, full=True, window=(cmin, emax), 

domain=(cmin, emax)) 


print('Raw fit results:', pfit, stats, sep='\n') 
A0, m = pfit 

resid, rank, sing_val, rcond = stats 
rms = np.sqrt(resid[0]/len(A)) 

print('Fit: A = {: .3f} [P] + {:.3f}'.format(m, A0) , 
'(rms residual = {:.4f})'.format(rms)) 

pylab.plot(conc, A, 'o', color='k') 

pylab.plot(conc, pfit(conc), color='k') 

pylab.xlabel('[P] /$\mathrm{\mu g\cdot mL A {-l}}$') 

pylab.ylabel('Absorbance') 

pylab.show() 


The output shows a good straight-line fit to the data (Figure 6.8): 

Raw fit results: 

poly([ 1.92896129 0.0583057 ]) 

[array([ 2.47932733]), 2, array([ 1.26633786, 0.62959385]), 2.2204460492503131e-15] 

Fit: A = 0.058[P] + 1.929 (rms residual = 0.4979) 



Figure 6.8 Line of least squares best fit to absorbance data as a function of concentration. 
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Table 6.6 Radius of the ball of fire produced by the “Trinity” nuclear 
test as a function of time 


t /ms 

R /m 

t /ms 

R /m 

t /ms 

R /m 

0.1 

11.1 

1.36 

42.8 

4.34 

65.6 

0.24 

19.9 

1.50 

44.4 

4.61 

67.3 

0.38 

25.4 

1.65 

46.0 

15.0 

106.5 

0.52 

28.8 

1.79 

46.9 

25.0 

130.0 

0.66 

31.9 

1.93 

48.7 

34.0 

145.0 

0.80 

34.2 

3.26 

59.0 

53.0 

175.0 

0.94 

36.3 

3.53 

61.1 

62.0 

185.0 

1.08 

38.9 

3.80 

62.9 



1.22 

41.0 

4.07 

64.3 




Note: This data can be downloaded from scipython.com/ex/afg 


6.4.7 Exercises 

Questions 

Q6.4.1 The third derivative of the polynomial function P(x) = 3x 3 + 2x — 7 is 18, so 
why does the following evaluate as False? 

In [x]: Polynomial((-7, 2, 0, 3)).deriv(3) == 18 
Out[x]: False 


Q6.4.2 Find and classify the stationary points of the polynomial 
f(x) = (x 2 + x — 11) 2 + (x 2 + x — 7) 2 . 


Problems 

P6.4.1 The expansion of the spherical ball of fire generated in an explosion may be 
analyzed to deduce the initial energy, E, released by a nuclear weapon. The British 
physicist Geoffrey Taylor used dimensional analysis to demonstrate that the radius of 
this sphere, R(t), should be related to E, the air density, p mr , and time, t, through 

R(t) = CE* p^r i, 

where, using model-shock wave problems, Taylor estimated the dimensionless constant 
C ^ 1. Using the data obtained from declassified timed images of the first New Mexico 
atomic explosion, Taylor confirmed this law and produced an estimate of the (then 
unknown) value of E. Use a log-log plot to fit the data in Table 6.6 24 to the model and 
confirm the time-dependence of R. Taking p an - = 1.25 kgm -3 deduce E and express its 
value in Joules and in “kilotons of TNT” where the explosive energy released by 1 ton 
of TNT is arbitrarily defined to be 4.184 x 10 9 J. 


P6.4.2 Find the mean and variance of both x and y, the correlation coefficient and the 
equation of the linear regression line for each of the four data sets given in Table 6.7. 
Comment on these values in the light of a plot of the data. 


24 G. I. Taylor, (1950) Proc. Roy. Soc. London A201, 159. 
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Table 6.7 Four sample data sets for analysis of mean, variance and correlation 


X\ 

yi 

x 2 

37 

x 3 

V3 

X 4 

34 

10.0 

8.04 

10.0 

9.14 

10.0 

7.46 

8.0 

6.58 

8.0 

6.95 

8.0 

8.14 

8.0 

6.77 

8.0 

5.76 

13.0 

7.58 

13.0 

8.74 

13.0 

12.74 

8.0 

7.71 

9.0 

8.81 

9.0 

8.77 

9.0 

7.11 

8.0 

8.84 

11.0 

8.33 

11.0 

9.26 

11.0 

7.81 

8.0 

8.47 

14.0 

9.96 

14.0 

8.10 

14.0 

8.84 

8.0 

7.04 

6.0 

7.24 

6.0 

6.13 

6.0 

6.08 

8.0 

5.25 

4.0 

4.26 

4.0 

3.10 

4.0 

5.39 

19.0 

12.50 

12.0 

10.84 

12.0 

9.13 

12.0 

8.15 

8.0 

5.56 

7.0 

4.82 

7.0 

7.26 

7.0 

6.42 

8.0 

7.91 

5.0 

5.68 

5.0 

4.74 

5.0 

5.73 

8.0 

6.89 


These data can be downloaded as the file ex6-4-a-anscombe.tex from 
scipython.com/ex/aff . 

P6.4.3 The van der Waals equation of state may be written as follows to give the 
pressure, p, of a gas from its molar volume, V, and temperature, T : 

RT a 

P= v^b~w 

where a and b are molecule-specific constants and R = 8.314 JK -1 mol~ L is the gas 
constant. It can readily be rearranged to yield the temperature for a given pressure and 
volume, but its form giving the molar volume in terms of pressure and temperature is a 
cubic equation: 


pV 3 ~(pb + RT)V 2 + aV -ab = 0 

Of the three roots to this equation, below the critical point, (T c ,p c ) ali are real: the 
largest and smallest give the molar volume of the gas phase and liquid phase respec- 
tively; above the critical point, where no liquid phase exists, only one root is real and 
gives the molar volume of the gas (also known in this region as a supercritical fluid). 
The critical point is given by the condition (3 p/dV)j = ( d 2 p/dV 2 )r = 0 and for a van 
der Waals gas is given by the formulas 

8« a 

Tc = 21 Rb ' Pc = TTb 1 

For ammonia the van der Waals constants are a = 4.225 L 2 barmol“ 2 and b = 
0.03707 L mol -1 . 

a. Find the critical point of ammonia, and then determine the molar volume at room 
temperature and pressure, (298 K, 1 atm) and at (500 K, 12 MPa). 

b. An isotherm is the set of points ( p , V ) at a constant temperature satisfying an 
equation of state. Plot the isotherm (p against V ) for ammonia at 350 K using the 
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van der Waals equation of state and compare it with the 350 K isotherm for an 
ideal gas, which has the equation of state p = RT/V. 

P6.4.4 The first-stage rockets of the Satum V rocket that launched the Apollo 11 
mission generated an acceleration which increases with time throughout their operation 
(mostly because of the decrease in nrass as it bums its fuel). This acceleration nray 
be nrodeled (in units ofms -2 ) as a function of time after launch, t in seconds, by the 
quadratic function: 


a(t) = 2.198 + (2.842 x 10“ 2 )t + (1.061 x 10“ V 


Determine the distance traveled by the rocket at the end of the stage-one center-engine 
burn, 2 minutes, 15.2 seconds, after launch. 

(Harder) Assuming a constant lapse rate of T = —d T /d z = 6 Kkm -1 and a ground 
temperature of 302 K, at what time and altitude, z, did the rocket achieve Mach 1? 
During the relevant phase of the launch, take the average pitch angle to be 12°, and 
assume the speed of sound can be calculated as a function of absolute temperature 
to be 



yRT 


M ’ 


where the constant y = 1.4 and the mean molar mass of the atmosphere is M = 
0.0288 kgmol -1 . 


6.5 Linear algebra 


6.5.1 Basic matrix operatioris 


Although NumPy does have a matrix object (see Section 6.6), ali the same matrix 
operations can be carried out on a regular two-dimensional NumPy array. These include 
scalar multiplication, matrix (dot) product, elementwise multiplication and transpose: 

In [x] : A = np.array([ [0, 0.5], [-1, 2]]) 

In [x] : A 
Out [x] : 

array ( [ [ 0. , 0.5], 

[- 1 . , 2 . ] ] ) 

In [x]: A * 5 # multiplication by a scalar 

Out [x] : 

array([[ 0. , 2.5], 

[ -5. , 10. ]]) 

In [x] : B = np.array([ [2, -0.5], [3, 1.5]]) 

In [x] : B 
Out [x] : 

array([[ 2. , -0.5], 

[ 3. , 1.5]]) 

In [x] : A.dot(B) # or np.dot(A,B): matrix product 
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Out [x] : 

array([[ 1.5 , 0.75], 

[ 4. , 3.5 ]]) 


In [x]: A * B # elementwise multiplication 

Out [x] : 

array([[ 0. , -0.25], 

[-3. , 3. ]]) 

In [x]: A.transpose() # or simply A.T 

Out [x] : 

array([ [ 0. , -1. ] , 

[ 0.5, 2. ]]) 


Note that the transpose retums a view on the original matrix. 

The identity matrix is returned by passing the two dimensions of the matrix to the 


method 

np 

. eye: 


In [x] : 

np. 

eye(3, 

3) 

Out [x] : 




array([ 

[ 1. 

/ 0 . , 

0 


[ 0. 

, 1- , 

0 


[ 0. 

/ 0 . , 

1 


Matrix products 

NumPy contains further methods for vector and matrix products. For example, 

In [x] : a = np.array ( [1,2,3]) 

In [x]: b = np.array([0,1,2]) 

In [x]: np.inner(a,b) # inner product; here, the same as a.dot(h) 

Out [x] : 8 

In [x]: np.outer(a,b) # outer product 

Out [x] : 

array ( [ [0, 1, 2] , 

[0, 2, 4] , 

[0, 3, 6]]) 

To raise a matrix to an (integer) power, however, requires a method from the 
np. linalg module: 

In [x] : A = np. array ([[0, 0.5], [-1, 2]]) 

In [x]: np.linalg,matrix_power(A, 3) # the same as A.dot(A.dot(A)) 

Out [x] : 

array( [ [-1. , 1.75], 

[-3.5, 6. ]]) 

Note that the ** operator performs elementwise exponentiation: 

In [x]: A**3 # the same as A * A * A 

Out [x] : 

array([[ 0. , 0.125] , 

[-1. , 8. ]]) 


Other matrix properties 

The norm of a matrix or vector is returned by the function np.linalg.norm. It is 
possible to calculate several different norms (see the documentation), but the ones used 
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by default, are the Frobenius norm for two-dimensional arrays: 

ii a ii = 

and the Euclidean norm for one-dimensional arrays: 

/ \ 1/2 

ll«ll = = a/ kol 2 + Izil 2 + • • • + |z«-l| 2 . 

Thus, 

In [x]: np.linalg.norm(A) 

Out[x]: 2.2912878474779199 

In [x] : c = np.array([l, 2j, 1 - 1 j ] ) 

In [x]: np.linalg.norm(c) 

Out[x]: 2.6457513110645907 # sqrt(1 +4+2) 

The function np. linalg. det returns the determinant of a matrix, and the regular 
NumPy function np. trace returns its trace (the sum of its diagonal elements): 

In [x]: np.linalg.det(A) 

Out[x]: 0.5 

In [x]: np.trace(A) 

Out[x]: 2.0 

The rank of a matrix is obtained using np. linalg. matrix_rank: 

In [x]: np.linalg.matrix_rank(A) # matrix A has full rank 

Out[x]: 2 

In [x] : D = np.array([[1,1] , [2,2]]) # a rank deficient matrix 

In [x]: np.linalg.matrix_rank(D) 

Out[x]: 1 

To find the inverse of a square matrix, use np . linalg. inv. A LinAlgError excep- 
tion is raised if the matrix inversion fails: 

In [x]: np.linalg.inv(A) 

Out [x] : 

array([[ 4 . , -1. ] , 

[ 2 . , 0.]]) 

In [x]: np.linalg.inv(D) 

LinAlgError: Singular matrix 


6.5.2 Eigenvalues and eigenvectors 

To calculate the eigenvalues and (right) eigenvectors of a general square array with 
shape (n, n) , use np . linalg. eig, which returns the eigenvalues, w, as an array of 
shape (n, ) and the normalized eigenvectors, v, as a complex array of shape (n,n). 
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The eigenvalues are not returned in any particular order, but the eigenvalue w [ i ] corre- 
sponds to the eigenvector v [:, i] . Note that the eigenvectors are arranged in columns. 
If the eigenvalue calculation does not converge for some reason a LinAlgError is 


raised. 



In [x] : 

vals, 

vecs = np.linalg.eig(A) 

In [x] : 

vals 


Out [x] : 

array( 

[ 0.29289322, 1.70710678 


O In [x] : np.isclose(np.sum(vals), A.traceO) 
Out[x]: True 


In [x] : vecs 
Out [x] : 

array([[-0.86285621, -0.28108464], 

[-0.50544947, -0.95968298]]) 

O Verify that the sum of the eigenvalues is equal to the matrix trace. 

If the matrix is Hermitian or real-symmetric, the function np. linalg. eigh may be 
used instead. This method takes an additional argument, uplo, which can be ' L' or 
' u' according to whether the lower or upper triangular part of the matrix is used. The 
default is ' L '. 

Two additional methods, np . linalg. eigvals and np. linalg. eigvalsh, return 
only the eigenvalues (and not the eigenvectors) of a general and Hermitian matrix 
respectively. 

Since NumPy version 1.8, these and most other linalg methods follow the usual 
broadcasting rules so that several matrices can be operated on at once: each matrix is 
assumed to be stored in the last two dimensions. For example, we may work with an 
array with shape (3,2,2) representing the three 2x2 Pauli matrices: 



In [x]: pauli_matrices = np.arraytt 


( (0, 

1), 

(1 

, 0)) , 

# 

sigma x 

((0, 

-Ij) 

, 

(lj, 0)), 

# 

sigma y 

((1, 

)) 

0), 

(0 

, -D) 

# 

sigma z 


In [x]: np.linalg.eigh(pauli_matrices) 
Out [x] : 

(array([[-1., 1.], 

[- 1 ., 1 .], 

[- 1 ., 1 .]]), 


array( [[ [-0.70710678 + 0.j , 0.70710678 + 0.j ], 

[ 0.70710678+0.j , 0.70710678+0.j ]], 

[ [-0.70710678-0.j , -0.70710678 + 0.j ], 

[ 0.00000000+0.70710678j, 0.00000000-0.70710678]]], 

[[ 0.00000000+0.j , 1.00000000+0.j ], 

[ 1.00000000+0.j , 0.00000000+0.j ]]])) 
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6.5.3 Solving equations 

Linear scalar equations 

NumPy provides an efficient and numerically stable method for solving Systems of 
linear scalar equations. The set of equations 

m\\x\ + mi 2 X 2 H-1- m\ n x\ = b\ 

m 2 ix\ + /« 22 X 2 H- h m 2 „X2 = b 2 


m n 1 x 1 + m n2 x 2 H-h m nn x n = b n 


can be expressed as the matrix equation Mx = b: 


/ /«h /«12 ■■■ /« 1 » \ 


( Xl > 


/ b \ \ 

/«21 /«22 ' ' ' /«2/1 


X2 

= 

b 2 

V m n 1 m n 2 • • • m nn / 


V %n ) 


V b n ) 


The solution of this System of equations (the vector x) is retumed by the np . 1 inalg. 
solve method. For example, the three simultaneous equations 

3x — 2y = 8 
—2x + y — 3z = —20 
4x + 6y + z = 7 

can be represented as the matrix equation Mx = b: 



and solved by passing arrays corresponding to matrix M and vector b to np. 1 inalg. 
solve: 

In [x] : M = np.array ( [[3,-2,0], [-2,1, -3] , [4,6,1]]) 

In [x] : b = np. array ( [8 ,-20,7] ) 

In [x]: np.linalg.solve(M, b) 

Out[x]: array([ 2., -1., 5.]) 

That is, x = 2,y = — l,z — 5. 

If no unique solution exists (for nonsquare or singular matrix, M), a LinAlgError 
is raised. 

Linear least squares Solutions (“best fit”) 

Where a set of equations, Mx = b, does not have a unique solution, a least squares 
solution that minimizes the L 2 norm, || b — Mx || 2 (sum of squared residuals) may be 
sought using the np. linalg. lstsq method. This is the type of problem described as 
over-determined (more data points than the two unknown quantities, /« and c). Passed 
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M and b, np . linalg. lstsq retums the solution array x, the sum of squared residuals, 
the rank of M and the singular values of M. 

A typical use of this method is to find the “line of best-fit”, y = mx+c, through some 
data thought to be linearly related as in the following example. 


Example E6.12 The Beer-Lambert Law relates the concentration, c, of a substance in 
a solution sample to the intensity of light transmitted through the sample, / t across a 
given path length, /, at a given wavelength, X: 

/, = ke~ acl . 


where Iq is the incident light intensity and a is the absorption coefficient at X. 

Given a series of measurements of the fraction of light transmitted, It/Io, a may be 
determined through a least squares fit to the straight line: 

y = ln — = —aci. 

Io 

Although this line passes through the origin (y = 0 for c = 0), we will fit the more 
general linear relationship: 


y = mc + k 

where m = —ai, and verify that k is close to zero. 

Given a sample with path length / = 0.8 cm, the following data were measured for 
I t /Io at five different concentrations: 


c /M 

It/Io 

0.4 

0.886 

0.6 

0.833 

0.8 

0.784 

1.0 

0.738 

1.2 

0.694 


The matrix form of the least squares equation to be solved is 


^ C\ 1 ^ 

C2 1 

C 3 1 

C4 1 

(:)- 

( T 1 > 

r 2 

73 

74 

V C 5 1 ) 


l ^ ) 


where T = ln(/ t //o). The code here determines m and hence a using np.linalg. 
lstsq: 


Listing 6.9 Linear least squares fitting of the Beer-Lambert Law 

# eg6-beer-lambert-lstsq.py 
import numpy as np 
import pylab 
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# Path length, cm 
path = 0.8 

# The data: concentrations (M) and It/10 
c = np.array([0.4, 0.6, 0.8, 1.0, 1.2]) 

It_over_I0 = np.array([ 0.891 , 0.841, 0.783, 0.744, 0.692]) 

n = len(c) 

A = np.vstack((c, np.ones(n))).T 
T = np.log(It_over_I0) 

O x, resid, _, _ = np.linalg.lstsq(A, T) 
m, k = x 

alpha = - m / path 

print('alpha = { : . 3f} M-l.cm-1'.format(alpha)) 
print('k =', k) 

print('rms residual = ' , np.sqrt(resid[0])) 

pylab.plot(c, T, 'o') 

pylab.plot(c, m*c + k) 

pylab. xlabel (' $c\; /\mathrm{lVl} $' ) 

pylab.ylabel('$\ln(I_\mathrm{t}/I_0)$ 7 ) 

pylab.show() 


O Here, _ is the dummy variable name conventionally given to an object we do not 
need to store or use. 

The output produces a best fit value of a = 0.393 M -1 cm -1 and a value of k 
compatible with experimental error: 

alpha = 0.393 M-l.cm-1 

k = 0.0118109033334 

rms residual = 0.0096843591966 

Figure 6.9 shows the data and fitted line. 



Figure 6.9 Line of least squares best fit to absorbance data as a function of concentration. 
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6.5.4 Exercises 
Questions 

Q6.5.1 Demonstrate that the three Pauli matrices given in Section 6.5.2 are unitary. 
That is, that cr J a p = /2 for p = x,y,z, where /2 is the 2x2 identity matrix and t 
denotes the Hermitian conjugate (conjugate transpose). 

Q6.5.2 The ticker timer, much used in school physics experiments, is a device that 
marks dots on a strip of paper tape at evenly spaced intervals of time as the tape moves 
through it at some (possibly variable) speed. The following data relate to the positions 
(in cm) of marks on a tape pulled through a ticker timer by a falling weight. The marks 
are made every 1/10 sec. 

X = [1.3, 6.0, 20.2, 43.9, 77.0, 119.6, 171.7, 233.2, 304.2, 384.7, 

474.7, 574.1, 683.0, 801.3, 929.2, 1066.4, 1213.2, 1369.4, 1535.1, 

1710.3, 1894.9] 

Fit these data to the function x = x$ + vq/ + \gt 2 and determine an approximate value 
for the acceleration due to gravity, g. 


Problems 

P6.5.1 In physics, the Planck units of measurement are those defined such that the 
five universal physical constants, c (the speed of light), G (the gravitational constant), h 
(the reduced Planck constant), (4tt eo) -1 (the Coulomb constant) and Ab (the Boltzmann 
constant) are set to unity. The dimensions of these quantities in ternis of length (L), mass 
(M), time (T), charge (Q) and thermodynamic temperature (0) are given in Table 6.8, 
along with their values in SI units. 

This suggests the following matrix relationship between the constants and their 
dimensions: 




L 

M 

T 

Q 

0 


c 

/ 

1 

0 

-1 

0 

0 

\ 

G 


3 

-1 

-2 

0 

0 


h 


2 

1 

-1 

0 

0 


(47reo) -1 


3 

1 

-2 

-2 

0 


&B 


2 

1 

-2 

0 

-1 

/ 


Table 6.8 Some physical constants and their dimensions 


c 

Speed of light 

2.99792458 x 10 8 ms“' 

LT -1 

G 

Gravitational 

constant 

6.67384 x IO -11 n^kg -1 s“ 2 

L 3 M -1 T -2 

h 

Reduced Planck 

constant 

1.054571726 x 10“ 34 Js 

L 2 MT _1 

(47r€ 0 )“ l 

Coulomb 

constant 

8.9875517873681764 x 10 9 Nm 2 C“ 2 

l 3 mt~ 2 q- 2 

Gi 

Boltzmann 

constant 

1.3806488 x 10“ 23 JKT 1 

l 2 mt -2 © _1 
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Using the inverse of this matrix, determine the SI values of length, mass, time, charge 
and temperature in the base Planck units; that is, the combination of these physical 
constants yielding the dimensions L, M, T, Q and 0. For example, the Planck length is 
found to be /p = y/hG/c 3 = 1.616199 x 10 -35 m. 


P6.5.2 The (symmetric) matrix representing the inertia tensor of a collection of 
masses, m,-, with positions (xi,yuzd relative to their center of mass is 


( Ixx Ixy Ixz 
Ixy lyy lyz 
Ixz lyz hz. 

where 


J2 m iixj + y 3 ), 

i 

- ^ ntiXUi. 


ixx — m i (y i + Zj), 

i 

Ixy = - ^ rruxiyu 


lyy ~ ^"1 m i( x j + Zj), 

i 

lyz = ~ m M z i’ 


hz — 


Lxz — 



There exists a transformation of the coordinate frame such that this matrix is diagonal: 
the axes of this transformed frame are called the principal axes and the diagonal inertia 
matrix elements, I a < Ib < / c , are the principal moments of inertia. 

Write a program to calculate the principal moments of inertia of a molecule, given the 
position and masses of its atoms relative to some arbitrary origin. Your program should 
first relocate the atom coordinates relative to its center of mass and then determine the 
principal moments of inertia as the eigenvalues of the matrix I. 

A molecule may be classified as follows according to the relative values of I a , Ib 
and / c : 


• I a = I b = I c : spherical top; 

• I a = I b < I c : oblate symmetric top; 

• I a < h = Ic' prolate symmetric top; 

• I a < Ib < I c ' asymmetric top. 


Determine the principal moments of inertia and classify the molecules NH 3 , CH 4 , 
CH 3 CI and O 3 given the data available at scipython.com/ex/afh . Also determine 


the rotational constants, A, B and C, related to the moments of inertia through 
Q = h/(&jT 2 cI q ) (Q = A, B, C; q = a, b, c) and usually expressed in cm -1 . 


P6.5.3 The NumPy method numpy. linalg. svd returns the singular value decom- 
position (SVD) of a matrix, M, as the arrays U, E and V satisfying the factorization: 
M = UEV : where t denotes the Hermitian conjugate (the conjugate transpose). 

The SVD and the eigendecomposition are related in that the left-singular row vec- 
tors, U are the eigenvectors of MM* and the right-singular column vectors, V, are the 
eigenvectors of M*M. Furthermore, the diagonal entries of E are the square roots of 
the nonzero eigenvalues of both MM* and MM*. 
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Show that this is the case for the special case of M a 3 x 3 matrix with ran- 
dom real entries by comparing the output of numpy. linalg. svd with that of 
numpy.linalg.eig. 

Hint: the singular values of M are sorted in descending order, but the eigenvalues 
returned by numpy. linalg.eig are in no particular order. Both methods produce 
normalized eigenvectors, but may differ by sign (ignore the possibility that any of the 
eigenvalues could have an eigenspace with dimension greater than 1). 


6.6 Matrices 

The NumPy matrix class is a subclass of the regular ndarray which provides some 
convenient functionality for dealing with matrices. There are some important differ- 
ences from conventional arrays, and care should be taken in using some of the familiar 
array operations as they have been overridden in the matrix subclass and behave differ- 
ently. 

A matrix is always a two-dimensional array. Even a row or column matrix has 
two dimensions (with shape (i,n) or (n,l) respectively), and flattening a matrix 
with flatten or ravel (see Section 6.1.5) returns a (l,n) array rather than a one- 
dimensional array. 


6 . 6.1 


Creating a matrix 

As an altemative to the regular ndarray construction methods, a matrix object can be 
created using the MATLAB-like syntax using a string of values in which columns are 
separated by spaces and rows by semicolons: 

In [x]: A = np.matrix([[0, -1], [1, -2]]) #as for np.array() 

In [x]: B = np.matrix('0 -1; 1-2') # MATLAB-like 

In [x] : print (B) 

[[ 0 - 1 ] 

[ 1 - 2 ]] 

The data type of the matrix can be set with the dtype attribute as for regular arrays. If 
a matrix is created from an existing ndarray object, the default behavior is to copy the 
data into the new matrix object; to construet a matrix consisting of a view on an existing 
ndarray’s data, set the attribute copy=False: 


In [x] 

a = np.array([[1, 2], 

In [x] 

A = np.matrix(a, copy 

In [x] 

B = np.matrix(a) 

In [x] 

a[0,0] = -1 

In [x] 

print (A [0,0] , B [0,0] ) 

-1 1 



That is, A is updated by the assignment a [o, 0] = -l, but B owns its own data and 
is not updated. Special matrices such as the identity matrix are best created using the 
corresponding ndarray constructor and passing the resulting array object to matrix 
with copy=False: 
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In [x]: I = np .matrix (np. eye (2,2) , copy=False) 

In [x]: N = np.matrix(np.zeros((2,2)), copy=False) 
In [x]: W = np.matrix(np.ones((2,2)), copy=False) 


Example E6.13 One way to create the two-dimensional rotation matrix, 

^ _ / cos 0 — sin 0 

\ sin 0 cos 0 

which rotates points in the xy plane counterclockwise through 6 = 30° about the origin: 

In [x]: theta = np.radians(30) 

In [x]: c, s = np.cos(theta) , np. sin(theta) 

In [x]: R = np.matrix('{} {}; {} {}'.format(c, -s, s, c)) 

In [x]: print (R) 

[[ 0.8660254 -0.5 ] 

[ 0.5 0.8660254]] 


6.6.2 Matrix operations 

The most important difference between matrix objects and arrays is in the behavior of 
the * and ** operators. As we have seen, these act elementwise on ndarrays: 

In [x] : a = np.array( [[0, -1] , [1, -2]]) 

In [x] : a * a # arrays: elementwise (Hadamard) product 

Out [x] : 

array ( [ [0, 1] , 

[1, 4]]) 

In [x]: a ** 3 # arrays: elementwise exponentiation 

Out [x] : 

array([[ 0, -1], 

[ 1 , - 8 ]] ) 

That is, 



For matrix objects these operators are matrix multiplication and exponentiation: 

In [x] : A = np .matrix ([[0, -1], [1, -2]]) 

In [x]: A * A # matrix multiplication 

matrix([[-1, 2], 

[-2, 3]]) 

In [x] : A ** 3 # ie A.A.A 

matrix([ [ 2, -3] , 

[ 3, -4]]) 
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That is, 



This simplifies the otherwise slightly cumbersome equivalents: a.dot(a) and 
a. dot (a) . dot (a) . Note that for both ndarray and matrix objects, multiplication 
by a scalar acts elementwise: 

In [x] : A * 4 
matrix( [ [ 0, -4] , 

[ 4, -8]]) 

As might be expected, matrix operations that already have methods implemented by the 
ndarray class are retained, including transpose (also available as the T attribute) and 
diagonal. Additionally, there are attributes for the Hermitian transpose (h) and matrix 
inverse (i). If the matrix is singular, a LinAlgError exception is raised if an attempt 
is made to take its inverse. 

For eigenfunctions and eigenvalues, see the description of NumPy’s linalg module 
(Section 6.5.2). 


Example E6.14 The matrix B, defined here, may be manipulated as follows: 


B = 


1 

3/ 


3 —j 
-1+7 



B 1 = ( 1 . 

--37 


V 3 +7 - 

-1-7' 

In 

[x] : B = np. matrix ( [ [1, 

3 -1 j ] 

In 

[x] : print (B) 


[ [ 

l.+O.j 3.-1.j] 


[ 

0,+3.j -l.+l.j]] 


In 

[x] : print (B.T) 


[ [ 

l.+O.j 0.+3.j] 


[ 

3.- 1 .j -l.+l.j]] 


In 

[x] : print (B . H) 


[ [ 

l.-O.j 0.-3.j] 


[ 

3.+ 1 .j -l.-l.j]] 


In 

[x] : print (B . I) 


[ [- 

-0.05-0.15j 0.05-0.35j] 


[ 

0.30 + 0.15j - 0.05 + 0.1j ] 

] 


B t = 


B 1 = 


1 

3-7 


37 

-1+7 


20 20-/ 
ra + 


20 20 J 

“20 + 11)7 
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Note that although these derived matrices look like attributes, they are not calculated 
until requested, 25 and so the use of the matrix class is not significantly slower than 
using regular ndarrays. 

A few other common matrix operations are found elsewhere in the NumPy package, 
including the trace, determinant, eigenvalues and (right) eigenvectors: 

In [x]: print(np.trace(B)) 

lj 

In [x] : print(np.linalg.det(B)) 

(-4-8j) 

In (x]: eigenvalues, eigenvectors = np.linalg.eig(B) 

In [x]: print(eigenvalues, eigenvectors, sep='\n\n') 

[ 2.50851535+2.09456868j -2.50851535-1.09456868]] 

[[ 0.77468569 + 0.j - 0.52924821 + 0.38116633j] 

[ 0.18832434+0.60365224j 0.75802940+0.j ]] 


6.6.3 Should you use NumPy matrices? 

The NumPy Matrix class is convenient if you have a lot of operations to perform with 
matrices and like the MATLAB-style syntax for manipulating them, but it does not 
provide any functionality that isn’t already available to ndarray objects. The multipli- 
cation operator, *, acting to produce matrix products can make code clearer but other 
common matrix operations stili require the use of the main NumPy library’s modules 
and functions. Indeed, the matrix class’s insistence in turning everything into a two- 
dimensional array can be rather trying. For example, alxn row matrix nrust be indexed 
M [o, j ] where j — 0,1 ,...,n — 1, and, bizarrely, even the trace method called on a 
matrix object returns a two-dimensional matrix object: 

In [x]: A.trace() 

matrix( [ [-2]]) # ?! 

In [x]: np.trace(A) # recommended alternative 

-2 

In short, while matrix objects may have the edge for simple calculations in an inter- 
active session, they do not have much to commend them over regular ndarrays to any 
but the most die-hard MATLAB fans . 26 


Example E6.15 The currents flowing in the closed regions labeled I \, h and /3 of the 
circuit given here may be analyzed by mesh cincilysis. 


25 They are properties of the matrix class which, in this case, are really class methods masquerading as 
attributes. 

26 Moreover, at the time of writing it seems that Python 3.5 is likely to include a specific infix operator for 
matrix multiplication, @. 
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50Q 



For each closed loop, we can apply KirchofTs Voltage Law C^ k Vu = 0) in conjunc- 
tion with Ohm’s Law ( V = /R), to give three simultaneous equations: 

50/i - 30/ 3 = 80, 

40/ 2 - 20/3 = 80, 

—30/i - 20/2 + IOO /3 = 0. 

These can be expressed in matrix form as RI = V: 

/ 50 0 -30 \ / /1 \ / 80 \ 

0 40 -20 / 2 = 80 , 

V -30 -20 100 / V /3 / V 0 / 

We could use the numerically stable np. linalg. solve method (Section 6.5.3) to 
lind the loop currents, I here, but in this well-behaved system 27 , let’s lind them by 
left multiplication by the matrix inverse, R . 

R _1 RI = I = R _ 1 V. 

Using NumPy’s matrix module: 

In [x]: R = np.matrix('50 0 -30; 0 40 -20; -30 -20 100') 

In [x]: V = np.matrix('80; 80; 0') 

In [x]: I = np.linalg.inv(R) * V 
In [x] : print (I) 

[[ 2.33333333] 

[ 2.61111111] 

[ 1 . 22222222 ]] 

Thus, h = 2.33 A ,/ 2 = 2.61 A ,/3 = 1.22 A. 


27 In general, matrix inversion may be an ili-conditioned problem, but this particular matrix is easy to invert 
accurately. See Section 9.2.2 for more on conditioning. 
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6.6.4 Exercises 


Problems 


P6.6.1 Let the column matrix 



describe the number of non-negative integers less than 10" (n > 0) that do (p n ) and do 


not ( q n ) contain the digit 5. Hence, for n = \,p\ = 1 and q\ = 9. Devise a matrix-based 
recursion relation for finding F n+ i from F n . 

How many numbers less than 10 10 contain the digit 5? 

For each n < 10, lind p n and verify that p n = 10" — 9". 

P6.6.2 The matrix 



can be used to produce the Fibonacci sequence by repeated multiplication: the element 
F n n of the matrix F" is the (n + l)th Fibonacci number (for n = 0,1,2---). Use 
NumPy’s matrix objects to calculate the first 10 Fibonacci numbers. 

One can show that 


F" = CD"C -1 , where D = C _1 FC 


is the diagonal matrix related to F through the similarity transformation associated with 
matrix c. Use this relationship to lind the 1 lOOth Fibonacci number. 

P6.6.3 The implicit formula for a conic section may be written as the second-degree 
polynomial, 


Q = Ax 2 + Bxy + Cy 1 + Dx + Ey + F = 0, 


or in matrix form using the homogeneous coordinate vector, 



as x r Qx = 0, where 



A B /2 D/2 
B/2 C E/2 
D/2 E/2 F 
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Conic sections may be classified according to the following properties of Q, where the 
submatrix Q 33 is 



• If detQ = 0, the conic is degenerate in one of the following fornis: 


if detQ 33 < 0, the equation represents two intersecting lines, 
if detQ 33 = 0 , the equation represents two parallel lines, 
if detQ 33 > 0, the equation represents a single point. 


• If detQ < 0 , the conic is a hyperbola. 

• If detQ 0 , the conic is an ellipse . 


If A = C and 5 = 0, the ellipse is a circle. 


Write a program to classify the conic section represented by the six coefficients 
A, 5, C,D,E and F. 

Some test-cases (coefficients not given are zero): 

• Hyperbola: 5 = 1,5= —9. 

• Parabola: A = \,D = 2,E = 

• Circle: A = \~C = \,D = —2,5 = -3,5 = 2. 

• Ellipse: A = 9, C = 4,5 = -36. 

• Two parallel lines: A = 1,5= —1. 

• A single point: A = 1, C = 1. 


6.7 Random sampling 

NumPy’s random module provides methods for obtaining random numbers from any 
of several distributions as well as convenient ways to choose random entries from an 
array and to randomly shuffie the contents of an array. 


As with the core library’s random module (Section 4.5.1), np.random uses a 
Mersenne Twister pseudorandom number generator (PRNG). The way it seeds itself is 
operating-system dependent, but it can be reseeded with any hashable object (e.g., an 
immutable object such as an integer) by calling np. random. seed. For example, using 
the randint method described here: 

In [x]: np.random.seed(42) 

In [x]: np.random.randint(1, 10, 10) #10 random integers in [1,10) 

array([7, 4, 8, 5, 7, 3, 7, 8, 5, 4]) 

In [x]: np.random.randint(1, 10, 10) 
array([8, 8, 3, 6, 5, 2, 8, 6, 2, 5]) 

In [x]: np.random.randint(1, 10, 10) 
array([1, 6, 9, 1, 3, 7, 4, 9, 3, 5]) 

In [x]: np.random.seed(42) # reseed the PRNG 

In [x]: np.random.randint(1,10, 10) 

array([7, 4, 8, 5, 7, 3, 7, 8, 5, 4]) # same as before 
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Figure 6.10 Histogram of 10,000 random samples from the uniform distribution on [0,1) 
provided by np . random. random_sample (). 


6.7.1 Uniformly distributed random numbers 

Random floating point numbers 

The basic random method, random_sample 28 takes the shape of an array as its argu- 
ment and creates an array of the corresponding shape filled with numbers sampled 
randomly from the uniform distribution over [0,1); that is, the interval between 0 and 1 
inclusive of 0 but exclusive of 1: 

In [x]: np.random.random_sample((3,2)) 
array([[ 0.92338355, 0.2978852 ], 

[ 0.75175429, 0.88110707], 

[ 0.16759816, 0.32203783]]) 

(called without an argument, it returns a single random number). If you want numbers 
sampled from the uniform distribution over [a, b), you need to do a bit of work: 

In [x] : a, b = 10, 20 

In [x]: a + (b-a) * np.random.random_sample((3,2)) 
array([[ 18.07084068, 12.11591797], 

[ 14.08171741, 19.34857282], 

t 13.06759203, 11.07003867]]) 

In a uniform distribution, every number has the same probability of being sampled, 
as can be seen from a histogram of a large number of samples (Figure 6.10): 

In [x]: pylab.hist(np.random.random_sample(10000), bins=100) 

In [x]: pylab.show() 

The np . random. rand method is similar, but is passed the dimensions of the desired 
array as separate arguments. For example, 


np. random. random_sample is also available under the aliases np. random. random, np. random. ranf 
and np. random. sample. 
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In [x]: np.random.rand(2,3) 

Out [x] : 

array([[ 0.61075227, 0.37459455, 0.95670676], 

[ 0.25276732, 0.1601836 , 0.3746576 ]]) 


Random integers 

Sampling random integers is supported through a couple of methods. The np. random. 
randint method takes up to three arguments low, high and size: 

• If both low and high are supplied, then the random number(s) are sampled from 
the discrete half-open interval [low, high ). 29 

• If low is supplied but high is not, then the sampled interval is [0 , low). 

• size is the shape of the array of random integers desired. If it is omitted, as with 
np . random. rand a single random integer is retumed. 


In [x]: np.random.randint(4) 

2 

In [x]: np.random.randint(4, size=10) 
array([3, 2, 2, 2, 0, 2, 2, 1, 3, 1]) 

In [x]: np.random.randint(4, size=(3,5)) 


array(| 

: [o. 

1, 

1, 

2 

, 2] , 


[2, 

o, 

3 , 

3 

, 0] , 


[0, 

1, 

0, 

1 

, 1] ] ) 

In [x] : 

np 

. random.randint 

array(| 

: (i. 

1, 

1, 

3 

, 2] , 


n. 

1, 

2, 

1 

, 3] , 


(i. 

3, 

1, 

3 

, 1] ] ) 


# random integer from [0, 4) 

#10 random integers from [0,4) 

# array of random ints from [0,4) 


# array of random ints from [1,4) 


np. random. randint can be useful for selecting random elements (with replacement) 
from an array by picking random indexes: 

In [x]: a = np.array([6,6,6,7,7,7,7,7,7]) 

In [x]: a[np.random.randint(1en(a), size=5)] 
array([7, 7, 1, 6 , 7]) 

The other method for sampling random integers, np. random. random_integers has 
the same syntax but returns integers sampled from the uniform distribution over the 
closed interval [low, high] (if high is supplied) or [0, low] (if it is not). 


Example E6.16 These random integer methods can be used for sampling from a set of 
evenly spaced real numbers, though it requires a bit of extra work: to pick a number 
from n evenly spaced real numbers between a and b (inclusive), use 

In [x]: a + (b-a) * (np.random.random_integers(n) - 1) / (n-1.) 

For example, to sample from [j, f, §, |], 

In [x] : a, b, n = 0.5, 3.5, 4 

In [x]: a + (b-a) * (np.random.random_integers(n, size=10) - 1) / (n-1.) 
array( [ 1.5, 0.5, 1.5, 1.5, 3.5, 2.5, 3.5, 3.5, 3.5, 3.5]) 


29 Note that this is different from the behavior of the Standard library’s random. randint (a, b) method (see 
Section 4.5.1) which picks numbers uniformly from the closed interval, [a, b]. 
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Example E6.17 In a famous experiment, a group of volunteers are asked to toss a fair 
coin 100 times and note down the results of each toss (heads, H, or tails, t). It is generally 
easy to spot the participants who fake the results by writing down what they think is a 
random sequence of hs and ts instead of actually tossing the coin because they tend not 
to include as many “streaks” of repeated results as would be expected by chance. 

If they had access to a Python interpreter, here’s how they could produce a more 
plausibly random set of results: 

In [x] : res = ['H' , 'T' ] 

In [37]: tosses = ''.join([res[i] for i in np.random.randint(2, size=100)]) 

In [38]: tosses 

Out[38]: 'TTHHTHHTTHHHTHTTHHHTHHTHTTHHTHHTTTTHHHHHHHHTTTHTTHHHHHHHTHHHTHHHH 
THTTTHTTHHHHTHTTTTHTTTHTHHTTHHHHHHH' 

This virtual experiment features a run of eight heads in a row, and two runs of seven 
heads in a row: 


TAILS 

! 

i 

I 

HEADS 


1 

8 

! 

k 


1 

7 

1 

k k 


1 

6 

! 



! 

5 

1 


•k * 

1 

4 

1 

k k 

k k k 

1 

3 

1 

k k k 

k k k k k k k 

1 

2 

! 

k k k k k k 

kkkkkkkkkk 

! 

1 

I 

kkkkkkkk 


6.7.2 Random numbers from nonuniform distributions 

The full range of random distributions supported by NumPy is described in the official 
documentation. 30 In the next section we describe in detail only the normal, binomial 
and Poisson distributions. 


The normal distribution 

The normal probability distribution is described by the Gaussian function, 


Pix) = 


1 


Os/ln 


exp 


(x - pY 
2 cr 2 


where /x is the mean and a the Standard deviation. The NumPy function, np . random. 
normal, selects random samples from the normal distribution. The mean and Standard 
deviation are specified by loc and scale respectively, which default to 0 and 1. The 
shape of the retumed array is specified with the size attribute. 


In [x]: np.random.normal() 

-0.34599057326978105 

In [x]: np.random.normal(scale=5., size=3) 


30 http://docs.scipy.org/doc/numpy/reference/routines.random.html. 
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array([ 4.38196707 

In [x]: np.random.normal(100., 


array([[ 107.730434 

[ 100.75627505, 
[ 88.82658615, 
[ 105.91254312, 


-5.17358738, 11.93523167]) 

8., size=(4,2)) 

101.06221195], 

88.79995561], 

94.89630767], 

98.21190741]]) 


It is also possible to draw numbers from the Standard normal distribution (that with 
/x = 0 and a = 1) with the np. random. randn method. Like random. rand, this 
takes the dimensions of an array as its arguments: 

In [x]: np.random.randn(2, 2) 
array([[-1.25092263, 2.6291925 ], 

[ 0.34158642, 0.40339403]]) 

Although np. random. randn does not provide a way to set the mean and Standard 
deviation explicitly, the Standard distribution can be rescaled easily enough: 

In [x]: mu, sigma = 100., 8. 

In [x]: mu + sigma * np.random.randn(4, 2) 

array([[ 104.92454826, 98.84646729], 

[ 109.43568726, 92.9568489 ], 

[ 90.21632016, 96.25271625], 

[ 102.65745451, 89.94890264]]) 


Example E6.18 The normal distribution may be plotted from sampled data as a his- 
togram (Figure 6.11): 

In [x]: mu, sigma = 100., 8. 

In [x]: samples = np.random.normal(loc=mu, scale=sigma, size=10000) 

In [x] : counts, bins, patches = pylab.hist(samples, bins=100, normed=True) 

In [x]: pylab.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * 

... np.exp( -(bins - mu)**2 / (2 * sigma**2) ), lw=2) 

In [x]: pylab.show() 



Figure 6.11 Histogram of 10,000 random samples from the normal distribution provided by 
np.random.normal. 
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6.7.3 


The binomial distributiori 


The binomial probability distribution describes the number of particular outcomes in a 
sequence of n Bernoulli trials - that is, n independent experiments, each of which can 
yield exactly two possible outcomes (e.g., yes/no, successifailure, heads/tails). If the 
probability of a single particular outcome (say, success) is p, the probability that such a 
sequence of trials yields exactly k such outcomes is 


C) /(iwhere C) 


n\ 

k\(n-k)\' 


For example, when a fair coin is tossed, the probability of it coming up heads each 
time is The probability of getting exactly three heads out of four tosses, is therefore 
4(j)(^) 1 ’ = i, where the factor of ( 4 ) = 4 accounts for the four possible equivalent 
outcomes: THHH, HTHH, HHTH, HHHT. 

To sample from the binomial distribution described by parameters n and p, use 
np. random. binomial (n, p). Again, the shape of an array of samples can be 
specified with the third argument, size: 


In [x]: np.random.binomial(4, 0.5) 

2 

In [x]: np.random.binomial(4, 0.5, (4,4)) 

array([ [1, 2, 2, 4] , 

[2, 1, 3, 2] , 

[2, 3, 1, 1] , 

[2, 4, 2, 3]]) 


Example E6.19 There are two stable isotopes of carbon, 12 C and 13 C (the radioactive 
14 C nucleus is present in nature in only trace amounts of the order of parts per trillion). 
Taking the abundance of 13 C to be x = 0.0107 (i.e., about 1%), we will calculate the 
relative amounts of buckminsterfullerene, C 6 o, with exactly zero, one, two, three and 
four 13 C atoms. (This is important in nuclear magnetic resonance studies of fullerenes, 
for example, because only the 13 C nucleus is magnetic and so detectable by NMR.) 

The number of 13 C atoms in a population of carbon atoms sampled at random from 
a population with natural isotopic abundance follows a binomial distribution: the prob¬ 
ability that, out of n atoms, m will be 13 C (and therefore n — m will be 12 C) is 

AA«) = (")x m d -xf~ m . 

We can, of course, calculate p m ( 60) exactly from this formula for 0 < m < 4, but we 
can also simulate the sampling with the np. random. binomial method: 

Listing 6.10 Modeling the distribution of 13 C atoms in C6o 

# eg6-e-cl3-a.py 
import numpy as np 

n, x = 60, 0.0107 
mmax = 4 

m = np.arange(mmax+1) 
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# Estimate the abundances by random sampling from the binomial distribution 

ntrials = 10000 

pbin = np.empty(mmax+1) 

for r in m: 

O pbin[r] = np.sum(np.random.binomial(n, x, ntrials)==r)/ntrials 


# Calculate and store the binomial coefficients nCm 
nCm = np.empty(mmax + 1) 

nCm[0] =1 
for r in m[1 :]: 

nCm[r] = nCm[r-l] * (n - r + 1) / r 

# The "exact" answer from binomial distribution 
p = nCm * x**m * (l-x)**(n-m) 

print ( 7 Abundances of C60 as (13C) [m] (12C) [60-m]') 
print('m "Exact" Estimated') 
print ('-'*24) 
for r in m: 

print( , {:ld} {:6.4f} {:6.4f}'.format(r, p[r], pbin[r])) 


O For each value of r in the array m, we sample a large number of times (ntrial) 
from the binomial distribution described by n = 60 and probability, x = 0.0107. The 
comparison of these sample values with a given value of r yields a boolean array which 
can be summed (remembering that True evaluates to 1 and False evaluates to 0); 
division by ntrials then gives an estimate of the probability of exactly r atoms being 
of type 13 C and the remainder of type 12 C. 

The explicit loop over m could be removed by creating an array of shape (ntrials, 
mmax+ 1) containing ali the samples, and summing over the first axis of this array in the 
comparison with the m array: 

samples = np.random.binomial(n, x, (ntrials, mmax+1)) 
pbin = np.sum(samples == m, axis=0) / ntrials 

The abundances of l3 C^C(,o- m produced by our program are given as the following 
output. 

Abundances of C60 as (13C) [m] (12C) [60-m] 
m "Exact" Estimated 


0 

0.5244 

0.5199 

1 

0.3403 

0.3348 

2 

0.1086 

0.1093 

3 

0.0227 

0.0231 

4 

0.0035 

0.0031 


That is, almost 48% of C 60 molecules contain at least one magnetic nucleus. 


6.7.4 The Poisson distribution 

The Poisson distribution describes the probability of a particular number of independent 
events occurring in a given interval of time if these events occur at a known average rate. 
It is also used for occurrences in specified intervals over other domains such as distance 
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or volume. The Poison probability distribution of the number of events, k, is 

P(k) = 


X k e~ x 


k\ 


where the parameter A is the expected (average) number of events occurring within the 
considered interval. 31 The NumPy implementation np. random. poisson takes A as 
its first argument (which defaults to 1) and, as before the shape of the desired array of 
samples can be specified with a second argument, size. For example, if I receive an 
average of 2.5 emails an hour, a sample of the number of emails I receive each hour 
over the next 8 hours could be obtained as: 


In [x]: np.random.poisson(2.5, 8) 
array([4, 1, 3, 0, 4, 1, 3, 2]) 


Example E6.20 The endonuclease enzyme EcoR\ is used as a restriction enzyme which 
cuts DNA at the nucleic acid sequence GAATTC. Suppose a given DNA molecule 
contains 12000 base pairs and a 50% G+C content. The Poisson distribution can be 
used to predict the probability that EcoPl will fail to cleave this molecule as follows: 

The recognition site, GAATTC, consists of six nucleotide base pairs; the probability 
that any given six-base sequence corresponds to GAATTC is 1/4 6 = 1/4096 and 
so the expected number of cleavage sites for EcoRl in this DNA molecule is A = 
12000/4096 = 2.93. From the Poisson distribution, we expect the probability that the 
endonuclease will fail to cleave this molecule is therefore 

A°g- A 

P( 0) =-= 0.053, 

0 ! 

or about 5.3%. To simulate the possibilities stochastically: 

In [x]: lam = 12000 / 4**6 
In [x]: N = 100000 

In [x]: np.sum(np.random.poisson(lam, N)==0)/N 
Out[x]: 0.053699999999999998 


6.7.5 Random selections, shuffling and permutations 

It is often the case that given an array of values, you wish to pick one or more at random 
(with or without replacement). This is the purpose of the np. random. choice method. 
Given a single argument, an one-dimensional sequence, it returns a random element 
drawn from the sequence: 

In [x]: np.random.choice([ 1, 5, 2, -5, 5, 2, 0]) 

2 

In [x]: np.random.choice(np.arange(10)) 

7 


31 The Poisson distribution is the limit of the binomial distribution as n —> oo and p —> 0 such that X = np 
tends to some finite constant value. 
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A second argument, size, Controls the shape of the array of random samples retumed, 
as before. By default, the elements of the sequence are drawn randomly with a uni- 
form distribution and with replacement; to draw the sample without replacement, set 
repiace=False. 

In [x]: a = np.array([1, 2, 0, -1, 1]) 

In [x]: np.random.choice(a, 6) # six random selections from a 

array([ 1, -1, 2, 1, -1, 1]) 

In [x]: np.random.choice(a, (2,2), replace=False) 

array([[ 2, -1], 

[ 1, 0]]) 

In [x]: np.random.choice(a, (3,2), replace=False) 

... <some traceback information> ... 

ValueError: Cannot take a larger sample than population when 'replace=False' 


This last example shows that, as you might expect, it is not possible to draw a larger 
number of elements than there are in the original population if you are sampling without 
replacement. 

To specify the probability of each element being selected, pass a sequence of the same 
length as the population to be sampled as the argument p. The probabilities should sum 


to 1. 


In [x] 

a = np.array([1, 2, 

0, 

-i, i]) 


In [x] 

np.random.choice(a, 

5, 

p=[0.1, 0.1, 0., 

0.7, 0.1]) 

Out [x] 

array([-1, -1, -1, 

-1, 

1]) 


In [x] 

np.random.choice(a. 

2, 

False, p=[0.1, 0. 

1, 0., 0.8, 

Out [x] 

array([-1, 2]) 


# sample without 

replacement 


There are two nrethods for permuting the contents of an array: np. random. shuf f le 
randomly rearranges the order of the elements in place whereas np.random. 
permutat ion makes a copy of the array first, leaving the original unchanged: 

In [x]: a = np.arange(6) 

In [x]: np.random.permutation(a) 


array([4, 

In [x] : a 

2, 

5, 

1, 

3, 

0] ) 

array( [0, 

1, 

2, 

3 , 

4, 

5] ) 

In [x]: np.random. 

In [x] : a 

shuffle 

array( [5, 

4, 

1, 

3 , 

0, 

2] ) 


These methods only act on the first dimension of the array: 

In [x] : a = np.arange (6) .reshape(3, 2) 

In [x] : a 
array ( [ [0, 1] , 

[2, 3], 

[4, 5]]) 

In [x]: a.random.permutation(a) # permutes the rows, but not the columns 
array([ [2, 3] , 

[4, 5] , 

[0, 1]]) 
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6.7.6 Exercises 


Questions 


Q6.7.1 Explain the difference between 


In [x]: a = np.array([6,6,6,7,7,7,7,7,7]) 

In [x]: a[np.random.randint(1en(a), size=5)] 
array([7, 1, 1, 6, 7]) # (for example) 


and 


In [x]: np.random.randint(6, 8, 5) 

array([6, 6, 1, 1,1]) # (for example) 


Q6.7.2 In Example E6.16 we used random. random_integers to sample from the 


13 5 7 
2 ’ 2 ’ 2 ’ 2 


uniform distribution on the floating point numbers [ 


]. How can you do the 


same using the random. randint instead? 

Q6.7.3 The American lottery, Mega Millions, at the time of writing, involves the selec- 
tion of live numbers out of 75 and one from 15. The jackpot is shared among the players 
who match all of their numbers in a corresponding random draw. What is the probability 
of winning the jackpot? Write a single line of Python code using NumPy to pick a set 
of random numbers for a player. 

Q6.7.4 Suppose an «-page book is known to contain m misprints. If the misprints are 
independent of one another, the probability of a misprint occurring on a particular page 
is p = l/« and their distribution may be considered to be binomial. Write a short 
program to conduct a number of trial virtual “printings” of a book with n = 500, m = 
400 and determine the probability, Pr, that a single given page will contain two or more 
misprints. 

Compare with the resuit predicted by the Poisson distribution with rate parameter 



Problems 


P6.7.1 Simulate an experiment carried out ntrials times in which, for each experi- 
ment, n coins are tossed and the total number of heads each time is recorded. 

Plot the results of the simulation on a suitable histogram and compare with the 
expected binomial distribution of heads. 

P6.7.2 A classic problem, first posed by Georges-Louis Leclerc, Comte de Buffon, can 
be stated as follows: 

Given a plane ruled with parallel lines a distance d apart, what is the probability that a needle of 
length 1 < d dropped at random onto the plane will cross a line? 

The problem can be solved analytically, yielding the answer 2 l/nd\ show that this 
solution is given approximately for the case 1 = d using a random simulation (Monte 
Carlo) method, that is, by simulating the experiment with a large number of random 
orientations of the needle. 
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A related problem involves dropping a circular coin of radius a onto a floor consisting 
of square tiles, each of side d. Show that the probability of a coin Crossing a tile edge is 
1 — (d — 2a) 2 /d 2 and confirm it with a Monte Carlo simulation. 

P6.7.3 Some bacteria, such as E. coli, possess helical flagella which enable them to 
move toward attractants such as nutrients, process known as chemotaxis. When the 
flagella rotate counterclockwise the bactrium is propelled forward; when they rotate 
clockwise, it tumbles randomly, changing its orientation. A combination of such move- 
ments enables the bacterium to perform a biased random walk: if the bacterium senses 
it is moving up a concentration gradient toward an attractant it will rotate its flagella 
counterclockwise more often than clockwise so as to continue moving in that direction; 
conversely, if it is moving away it is more likely to rotate its flagella clockwise so as to 
tumble with the aim of randomly changing its orientation to one that points it toward 
the attractant. 

The chemotaxis of E. coli may be modeled (very) simplistically by considering a 
bacterium to move in a two-dimensional “world” populated by an attractant with a 
constant concentration gradient away from some location. At each of a series of time 
steps, a model bacterium detects whether it is moving up or down this gradient and 
either continues moving or tumbles according to some pair of probabilities. 

Write a Python program to implement this simple model of chemotaxis for a world 
consisting of the unit square with an attractant at its center. Plot the locations of 10 
model bacteria that start off evenly spaced around the unit circle centered on the attrac¬ 
tant location. 

P6.7.4 One way to simulate the meanders in a river is as the average of a large number 
of a random walks. 32 Using a coordinate system (x,y), start at point A = (0,0) and aim 
to finish at B = (b, 0). Starting from an initial heading of 0o from the AB direction, at 
each step change this angle by a random amount drawn from a normal distribition with 
mean /x = 0 and Standard deviation a, and proceed by unit distance in this direction. 
Discard any walks which do not, after n steps, finish within one unit of B (this will be 
the majority!). 

Write a program to find the average path meeting the above constraints for b = 10, 
using 0o = 110°, o — 17°, n = 40 and 10 6 random walk trials. Plot the accepted walks 
and their average, which should resemble a meander. 


6.8 Discrete Fourier transforms 

6.8.1 One-dimensional Fast Fourier Transforms 

numpy. fft is NumPy’s Fast Fourier Transform (FFT) library for calculating the dis¬ 
crete Fourier transform (DFT) using the ubiquitous Cooley and Tukey algorithm. 33 The 


33 B. Hayes, (2006) American Scientist 94, 490; H. von Schelling, General Electric Report No. 64GL92 
33 J. W. Cooley and J. W. Tukey, (1965) Math. Comput. 19, 297=301. 
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definition for the DFT of a function defined on n points,/„ ; , m = 1,2, ■ ■ ■ , n — 1 used 
by NumPy is 



( 6 . 1 ) 


NumPy’s basic DFT method, for real and complex functions, is np. fft. fft. If the 
input signal function, /, is considered to be in the time domain, the output Fourier 
Transform, F, is in the frequency domain and is returned by the fft(f) function 
call in a Standard order: F [ :n/2] are the positive-frequency terms in increasing order, 
F [n/2 + 1: ] contains the negative-frequency terms in decreasing order, and F[n/2] 
is the (positive and negative) Nyquist frequency. 34 np . abs (F), np. abs (F) **2 and 
np. angle ( F ) are the amplitude spectrum, power spectrum and phase spectrum respec- 
tively. 

The frequency bins corresponding to the values of F are given by np . f f t. f f tf req 
(n, d) where d is the sample spacing. For even n, this is equivalent to 


1 2 n/ 2-1 n/2 n/ 2-1 

’ dn dn’ ’ dn ’ dn dn 


To shift the spectrum so that the zero-frequency component is at the center, call 
np . f f t. f f tshif t. To undo that shift, call np . f f t. if f tshif t. 

For example, consider the following waveform in the time domain with some syn- 
thetic Gaussian noise added: 


/(f) = 2 sin (207 Tt) + sin (1007rt). 


In [x] : Al, A2 = 2, 1 

In [x]: freql,freq2 = 10, 50 

In [x]: fsamp = 500 

In [x]: t = np.arange(0, 1, 1/fsamp) 

In [x]: n = len(t) 

In [x]: f = Al*np.sin(2*np.pi*freql*t) + A2*np.sin(2*np.pi*freq2*t) 

In [x]: f += 0.2 * np.random.randn(n) 

In [x]: pylab.plot(t, f) 

In [x]: pylab.xlabel('Time /s') 

In [x]: pylab.Show() 

The plot of this waveform is depicted in Figure 6.12. 

The Fourier transform of this function is complex; its real and imaginary components 
are plotted here (Figure 6.13). 

In [x]: F = np.fft.fft(f) 

In [x]: pylab.plot(F.real, 'k', label='real') 

In [x]: pylab.plot(F.imag, 'gray', label='imag') 

In [x]: pylab.legend(loc=2) 

In [x]: pylab.show() 


34 Here, n is assumed to be even. 
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Figure 6.12 The noisy waveform referred to in the text. 
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Figure 6.13 The Fourier transform of a noisy waveform with two frequency components, as 
returned by np . f f t. f f t. 


real 

imag 





Now look at the shifted amplitude spectrum with the zero-frequency component at 
the center: 35 

In [x]: freq = np.fft.fftfreq(n, 1/fsamp) 

In [x]: F_shifted = np.fft.fftshift(F) 

In [x]: freq_shifted = np.fft.fftshift(freq) 

In [x]: pylab.plot(freq_shifted, np.abs(F_shifted)) 

In [x]: pylab.xlabel('Frequency /Hz') 

In [x]: pylab.show() 

This plot is given in Figure 6.14. 


The shifting here is for illustrationi note that it isn’t really necessary to shift both freq and F arrays simply 
to plot one against the other. 
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Figure 6.14 The Fourier Transform of a noisy waveform with two frequency components plotted 
against frequency. 



Figure 6.15 The positive-frequency components of the Fourier transform of a noisy waveform, 
normalized to show their intensities. 


Now, because our input function is real, its Fourier transform is Hermitian: the nega¬ 
tive frequency components are the complex conjugates of the positive frequency com¬ 
ponents so they don’t contain any further information. Therefore, we only need to deal 
with the first half of the F array. Plotted against its (positive) frequencies as an amplitude 
spectrum (Figure 6.15): 

O In [x]: spec = 2/n * np.abs (F [:n/2]) 

In [x]: pylab.plot(freq[:n/2], spec, 'k') 

In [x]: pylab.xlabel('Frequency /Hz') 

In [x]: pylab.show() 
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O Note that because of the way this DFT has been defined, a normalization factor of = 
is required to faithfully regenerate the original amplitudes of each component. 

The amplitudes of the 10 Hz and 50 Hz signals are easily resolved in this spectrum. 

The inverse Fourier Transform defined through 

1 n ~ l 

frn = - y' F k exp 
n 

k =0 

is returned by the method np.fft.ifft. 

If, as mentioned earlier, the input function array is real and only the non-negative 
frequency components are needed, the np. fft methods rfft, irfft, rfftfreq can 
be used. 


( 2 nimk\ 

-J m = 0,1,2, • • • , n — 1 


6 . 8.2 


Two-dimensional Fast Fourier Transforms 


Discrete Fourier transforms and their inverses in two and higher dimensions are possible 
using the np. fft methods fft2, ifft2, fftn and ifftn. The two-dimensional DFT 
is defined as 


m— 1 n— 1 

Fjk = ^ Tj Pq exp 

p =0 q =0 

j = 0,1, 2 , • ■ • , m — 1; k = 0,1, 2 , • • • , n — 1 . 


0 .(pj.qk 

—2ju (- 1 - 

m n 


and higher dimensions follow similarly. 


Example E6.21 The two-dimensional DFT is widely used in image processing. 36 For 
example, multiplying the DFT of an image by a two-dimensional Gaussian function 
is a common way to blur an image by decreasing the magnitude of its high-frequency 
components. 

The following code produces an image of randomly arranged squares and then blurs 
it with a Gaussian filter. 

Listing 6.11 Blurring an image with a Gaussian filter 

# eg6-fft2-blur.py 
import numpy as np 
import pylab 

# image size, square side length, number of squares 
ncols, nrows = 120, 120 

sq_size, nsq = 10, 20 


# The image array (0=background, l=square) and boolean array of allowed places 

# to add a square so that it doesn't touch another or the image sides 


^ Note that there is an entire SciPy subpackage, scipy .ndimage, not described in this book, devoted 
to image processing. This example serves simply to illustrate the syntax and format of NumPy’s two- 
dimensional FFT implementation. 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:20, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.1017/CB09781139871754.006 








6.8 Discrete Fourier transforms 


277 


image = np.zeros((nrows, ncols)) 

sq_locs = np.zeros((nrows, ncols), dtype=bool) 

sq_locs[1:-sq_size-l:,1:-sq_size-l] = True 

def place_square(): 

""" Place a square at random on the image and update sq_locs. """ 

# valid_locs is an array of the indexes of True entries in sq_locs 
valid_locs = np.transpose(np.nonzero(sq_locs)) 

# pick one such entry at random, and add the square so its top left 

# corner is there; then update sq_locs 

i, j = valid_locs[np.random.randint(len(valid_locs))] 
image[i:i+sq_size, j:j+sq_size] = 1 

imin, jmin = max(0,i-sq_size-l), max(0, j-sq_size-l) 
sq_locs[imin:i+sq_size+l, jmin:j+sq_size+l] = False 

# Add the required number of squares to the image 
for i in range(nsq): 

place_square() 
pylab.imshow(image) 
pylab.show() 

# Take the two-dimensional DFT and center the frequencies 
ftimage = np.fft.fft2(image) 

ftimage = np.fft.fftshift(ftimage) 
pylab.imshow(np.abs(ftimage)) 
pylab.show() 


# Build and apply a Gaussian filter. 
sigmax, sigmay = 10, 10 

cy, cx = nrows/2, ncols/2 
x = np.linspace(0, nrows, nrows) 
y = np.linspace(0, ncols, ncols) 

X, Y = np.meshgrid(x, y) 

gmask = np.exp(-(((X-cx)/sigmax)**2 + ((Y-cy)/sigmay)**2)) 

ftimagep = ftimage * gmask 
pylab.imshow(np.abs(ftimagep)) 
pylab.show() 

# Finally, take the inverse transform and show the blurred image 
imagep = np.fft.ifft2(ftimagep) 

pylab.imshow(np.abs(imagep)) 
pylab.show() 


The results are shown in Figure 6.16. 


6.8.3 Exercises 

Questions 

Q6.8.1 Compare the speed of execution of NumPy’s np. f f t. f f t algorithm and that 
of the direct implementation of Equation 6.1. 
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Problems 
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Figure 6.16 Blurring an image with a Gaussian filter applied its two-dimensional Fourier 
transform. 


Hint: treat the direct equation as a matrix multiplication (dot product) of an array of n 
function values (random ones will do) with the n x n array with entries exp(—2 tt imk/ri) 
(. m,k = 0, 1, • ■ • ,n — 1). Use IPython’s %timeit magic. 


P6.8.1 Consider a signal in the time domain defined by the function 

f(t) = cos(27Tvt)e~'/ r , 

with frequency v = 250 Hz decaying exponentially with a lifetime r = 0.2 s. Plot 
the function, sampled at 1,000 Hz, and its discrete Fourier transform against frequency. 
Examine, by means of a suitable plot, the effect of apodization on the DFT by truncating 
the time sequence after (a) 0.5 s, (b) 0.2 s. 


P6.8.2 A square wave of period T may be defined through the following function: 


/sq(0 = 


1 t <T/ 2 
-1 t >T/2 


with/(t) =f(t + nT) for n = ±1, ±2, 

Plot the square wave with T = 1 (and hence cycle frequency, v = l)for0<t< 
2 taking a grid of 2,048 time points over this interval. Calculate and plot its discrete 
Fourier transform. 

The Fourier expansion of this function is the infinite series 

4 oo i 

/sq(0 = - 2k - 1 ^ 2jZ ^ 2k 
71 k= 1 2 


Compare the square wave function with this Fourier expansion truncated at 3, 9 and 
18 terms. Also compare their (suitably normalized) Fourier transforms: the missing 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:20, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.1017/CB09781139871754.006 








6.8 Discrete Fourier transforms 


279 


frequencies in each truncated series should appear as zeros in its Fourier transform, 
whereas the present terms will have intensities 4/[n(2k — 1)]. 


P6.8.3 The scipy library provides a routine for reading in .wav files as NumPy 
arrays: 

In [x]: from scipy.io import wavfile 

In [x] : sample_rate, wav = wavfile . read (\emph{ <filename>}) 


For a stereo file, the array wav has shape (n, 2 ) where n is the number of samples. 

Use the routines of np. f f t to identify the chords present in the sound file chords . 
wav, which may be downloaded from scipython.com/ex/afi . Which major chord do 
they comprise? 

The frequencies of musical notes on an equal-tempered scate for which A 4 = 440 Hz 
are provided as a dictionary in the file notes . py. 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:20, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.1017/CB09781139871754.006 






7 Matpiotlib 


Matpiotlib is probably the most popular Python package for plotting data. It can be 
used through the procedural interface pylab in very quick Scripts to produce simple 
visualizations of data (see Chapter 3) but, as described in this chapter, with care it 
can also produce high-quality figures for journal articles, books and other publications. 
Although there is some limited functionality for producing three-dimensional plots (see 
Section 7.2.3), it is primarily a two-dimensional plotting library. 


7.1 Matpiotlib basies 

Matpiotlib is a large package organized in a hierarchy: at the highest level is the 
matpiotlib .pyplot module. This provides a “state-machine environment” with a 
similar interface to MATLAB and allows the user to add plot elements (data points, 
lines, annotations, etc.) through simple function calls. This is the interface used by 
pylab, which was introduced in Chapter 3. 

At a lower level, which allows more advanced and customizable use, Matpiotlib has 
an object-oriented interface that allows one to create a figure object to which one or 
more axes objects are attached. Most plotting, annotation and customization then occurs 
through these axes objects. This is this approach we adopt in this chapter. 

To use Matpiotlib in this way, we use the following recommended imports: 

import matpiotlib.pyplot as plt 
import numpy as np 


7.1.1 Basic figures 

Plotting on a single axes object 

The top-level object, containing all the elements of a plot is called Figure. To create 
a figure object, call plt . figure. No arguments are necessary, but optional customiza¬ 
tion can be specified by setting the values described in Table 7.1. For example, 

In [x]: # a default figure, with title "Figure 1" 

In [x] : fig = plt.figureO 

In [x]: # a small figure with red background 

In [x]: fig = plt.figure('Population density', figsize=(4.5, 2.), 

....: facecolor= / red') 

280 
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Table 7.1 Arguments to plt. f igure 


Argument 

Description 

num 

An identifier for the figure - if none is provided, an integer, starting at i, is used 
and incremented with each figure created. Alternativefy using a string will set 
the window titie to that string when the figure is displayed with plt. show (). 

figsize 

A tuple of figure (width, height), unfortunately in inches. 

dpi 

Figure resolution in dots-per-inch. 

facecolor 

Figure background color. 

edgecolor 

Figure border color. 


To actually plot data, we need to create an Axes object - a region of the figure 
containing the axes, tick-marks, labeis, plot lines and markers, and so on. The simplest 
figure, consisting of a single Axes object, is created and returned with 

In [x]: ax = fig.add_subplot(111) 

The argument 111 here is a commonly used abbreviation for the tuple (l, l, l) speci- 
fying subplot l of a figure with l row and l column of subplots (see Section 7.1.3). The 
Axes object, ax, is the one on which we can actually plot the data with ax. plot. The 
essential features of this plot method were described in Chapter 3. Here, however, we 
note that the plot method actually returns a list of objects representing the plotted 
lines. In its simplest usage, only a single line is plotted, and so this list consists of one 
Line2D object that we may assign to a variable if desired. As a full example, consider 
the following comparison of the catenary y = cosh(x) and its parabolic approximation, 
y = 1 + x 2 /2. 

import matplotlib.pyplot as plt 
import numpy as np 

fig = plt.figure() 

ax = fig.add_subplot(111) 

x = np.linspace(-2, 2, 1000) 

O line_cosh, = ax.plot(x, np.cosh(x)) 
line_quad, = ax.plot(x, x**2 / 2) 

plt.show() 

O Note the syntax line cosh, = . . . to assign the retumed line object to the vari¬ 
able line_cosh rather than the list containing that object. 

The two plotted lines are shown in Figure 7.1. 

Plot limits 

By default, Matplotlib plots ali of the data passed to plot and sets the axis limits accord- 
ingly. To set the axis limits to something else, use the ax. set xlim and ax. set_ylim 
methods. Either both limits can be set or an individual limit can be set with the argu¬ 
ments lef t, right (or xmin, xmax) and bottom, top (or ymin, ymax). Unspecified 
limits are left unchanged. For example, 
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Table 7.2 Matplotlib line styles 



(no line) 

- 

solid 


dashed 


dotted 


dash-dot 



Figure 7.1 A simple plot of two lines on a single Axes object. 


x = np.linspace(-3,3,1000) 
y = x**3 +2 * x**2 - x - 1 
fig = plt.figureO 
ax = fig.add_subplot(111) 
ax.plot(x,y) 


ax.set_xlim(-1,2) # x-limits are -1 to 2 

ax.set_ylim(bottom=0) # ymin=0: plot will be "clipped" at the bottom 

If bottom is greater than top or right less than lef t, the corresponding axis will be 
reversed; that is, values on this axis will decrease from left to right (or from bottom to 
top) (see Exercise P7.1.5). 

If you wish to invert the axis direction without changing the limit values, the method 
calls ax. invert_xaxis () and ax. invert_yaxis () will do that for you. 


Line styles, markers and colors 

As with pylab, the plot style can be specified by passing extra arguments to the plot () 
method. The default line style is a solid, 1.0 pt weight line in a color determined by the 
order in which it is added to the plot. 

An alternative line style can be selected from the predefined options with the 
linestyle (or simply ls) argument. Possible string values to pass to this argument 
(including the empty string for plotting no line) are shown in Table 7.2. 
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Table 7.3 Matplotlib colour 
code letters 


b 

blue 

g 

green 

r 

red 

c 

cyan 

m 

magenta 

y 

yellow 

k 

black 

W 

white 


Further customization is possible by setting the dashes argument to a sequence of 
values describing the repeated dash pattern in points. For example, dashes= [ 2 , 4 , 
8 , 4, 2, 4] ] represents a pattern of dot (2 pts), space (4 pts), dash (8 pts), space 
(4 pts), dot (2 pts), space (4 pts) to be repeated as the line style. Equivalently, one can 
call a plotted line’s set dashes rnethod, as in the following code snippet: 

x = np.linspace(-np.pi, np.pi, 1000) 
line, = plt.plot(x, np.sin(x)) 

line.set_dashes([2, 4, 8, 4, 2, 4]) # dot-dash-dot 

The line weight is customized by setting the lineweight (or simply lw) argument 
to a number of points. 

Line colors are specilied with the color (or simply c) argument used in one of 
several ways: 

• string : by letter or name, one of the values given in Table 7.3. 

• string: by HTML 6 -digit hex-string preceded by ' #', for example ' #f f f f 00 ' is 
yellow. 

• string: a string representation of a f loat between 0. and 1. (for example ' 0 . 4 ') 
gives a gray-scale between black ( 0 .) and white(l.). 

• tuple offloats between 0. and 1.: RGB components, for example ( 0 . 5 , o . , o .) 
is a dark red color. 

By default, the Line2D object created by calling plot on an Axes object does not 
include markers: symbols printed at each point on the plot. To add them, specify one of 
the single-character marker codes given in Table 7.4 using the marker argument 

ax.plot(x, y, marker='v') # downward pointing triangles 

Other marker properties can be set with the arguments listed in Table 7.5. 

Matplotlib markers can be further customized; see the documentation for details . 1 

Scatterplots 

A typical two-dimensional scatterplot depicts the data as points on a Cartesian axes 
System. Sometimes there is no meaningful or helpful ordering to the data and so no 


1 http://matplotlib.org/api/ markers_api. html. 
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Table 7.4 Some Matplotlib marker styles 
(single character sting codes) 


Code 

Marker 

Description 



point 

O 

O 

circle 

+ 

+ 

plus 

X 

X 

X 

D 

0 

diamond 

V 

V 

(downward triangle) 

A 

A 

(upward triangle) 

S 

□ 

square 

* 

★ 

star 


Table 7.5 Matplotlib marker properties 


Argument 

Abbreviation 

Description 

markersize 

markevery 

ms 

Marker size, in points 

Set to a positive integer, N, to print a marker every N 
points; the default, None, prints a marker for every 
point 

markerfacecolor 

mf c 

Fili color of the marker 

markeredgecolor 

mec 

Edge color of the marker 

markeredgewidth 

mew 

Edge width of the marker, in points 


need to join data points by lines. The pyplot. scatter function creates a scatterplot. 
In addition to one-dimensional sequences of x— and y— data, as for pyplot .plot, 
the data point marker colors and sizes can be set individually by passing a sequence of 
appropriate values of the same length as the data to the arguments s and c respectively. 
The marker sizes are in points 2 {points squared) so that their area is proportional to the 
values passed to s. Manipulating the size of the markers is a common way of indicating 
a third dimension to the data, as in the following example. 


Example E7.1 To explore the correlation between birth rate, life expectancy and per 
capita income, we may use a scatterplot. Note that the marker sizes are set in proportion 
to the countries’ percapita GDP but have to be scaled a little so they don’t get too large 
(see Figure 7.2). 

Listing 7.1 Scatterplot of demographic data for eight countries 

# egi-scatter.py 


import numpy as np 

import matplotlib.pyplot as plt 

countries = ['Brazil', 'Madagascar', 'S. Korea', 'United States', 
'Ethiopia', 'Pakistan', 'China', 'Belize'] 

# Birth rate per 1000 population 
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Figure 7.2 A scatterplot with variable marker sizes indicating each country’s GDP. 


birth_rate = [16.4, 33.5, 9.5, 14.2, 38.6, 30.2, 13.5, 23.0] 

# Life expectancy at birth, years 

life_expectancy = [73.7, 64.3, 81.3, 78.8, 63.0, 66.4, 75.2, 73.7] 

# Per person income fixed to US Dollars in 2000 

GDP = np.array([4800, 240, 16700, 37700, 230, 670, 2640, 3490]) 

fig = plt.figureO 

ax = fig.add_subplot(111) 

# Some arbitrary colors: 
colors = range(len(countries)) 

ax.scatter(birth_rate, life_expectancy, c=colors, s=GDP/10) 
ax.set_xlabel('Birth rate per 1000 population') 
ax.set_ylabel('Life expectancy at birth (years)') 

plt.show() 


Gridlines 

Gridlines are vertical (for the x-axis) and horizontal (for the y-axis) lines running across 
the plot to aid with locating the numerical values of data points. By default no gridlines 
are drawn, but they may be turned on by calling grid method on an Axes object (to add 
both horizontal and vertical gridlines) or the xaxis or yaxis objects of a given Axes 
(to select the gridlines to use). For example, 

ax.yaxis .grid (True) # Tum on horizontal gridlines 


or 


ax. grid (True) # Tum on ali gridlines 


The line properties of the gridlines are set with the linestyle, linewidth, color, 
etc. arguments as for plot lines. 
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Two sorts of gridlines correspond to the major and minor tick marks (see below): 
these can be selected with the which argument, which takes the values 'major', 
'minor' and ' both' . The default (if not specified) is which='major'. 

ax.xaxis.grid(True, which='minor', c='b') # Minor x-axis gridlines in blue 


Log scales 

By default, Matplotlib plots data on a linear scale. To set a logarithmic scale, call one 
or both of the following on your Axes object: 

ax.set_xscale('log') 
ax.set_yscale('log') 

Base-10 logarithms are used by default, but the (integer) base can be set with the 
optional arguments basex or basey. Nonpositive values in the data will be masked 
as invalid by default. If you want negative values to be handled "symmetrically" with 
positive ones, such that log(—|x|) = — log(|x|), then use ' symlog' instead of ' log'. 
See also Question 7.1.1. 

Adding tities, labeis and legends 

Axis labeis may be added to the subplot Axes object with ax. set_xlabel and 
ax.set_ylabel. 

Plot line legend labeis are defined by adding the label attribute to the plt.plot 
function call. However, the legend itself will not appear unless legend is called on the 
plot Axes object (e.g., with ax. legend ().) The appearance of the legend itself can 
be customized extensively, but the most common additional argument you may wish to 
pass to is legend is loc, defining the location of the legend on the plot (see Table 3.1). 

There are two types of title you may want to give your fi gure: fig.suptitle adds a 
centered title to the entire figure, which may contain more than one subplot; ax. title 
adds a title to a single subplot. 2 


Example E7.2 The data read in from the file eg7-marriage-ages. txt, which can 
be downloaded from scipython.com/eg/aag , giving the median age at first marriage 


in the United States for 13 decades since 1890 are plotted by the program below. Grid 
lines are turned on for both axes with ax. grid (), and custom markers are used for the 
data points themselves (see Figure 7.3). 


Listing 7.2 The median age at first marriage in the US over time 

# egi-marriage-ages .py 

import numpy as np 

import matplotlib.pyplot as plt 


year, age_m, age_f = np.loadtxt('eg7-marriage-ages.txt', unpack=True,skiprows=3) 

fig = plt.figureO 

ax = fig.add_subplot(111) 


2 See the documentation at http://matplotlib.org/api/legend_api.html for more details. 
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Figure 7.3 Median age at first marriage in the US, 1890-2010. 


# Plot ages with male or female symbols as markers 

ax.plot(year, age_m, marker='$\u2642$', markersize=14, c='blue', lw=2, 
mfc='blue' , mec='blue , ) 

ax.plot (year, age_f, marker=' $\u2640$' , markersize=14, c='magenta' , lw=2, 
mfc='magenta', mec='magenta') 
ax.grid() 

ax.set_xlabel( ' Year') 
ax.set_ylabel( ' Age ') 

ax.set_title('Median age at first marriage in the US, 1890 - 2010') 
plt.show() 


Example E7.3 The historical populations of live US cities are given in the files boston. 
tsv, houston. tsv, detroit. tsv, san_j ose . tsv, phoenix. tsv as tab-separated 


columns of (year, population). They can be downloaded from scipython.com/eg/aaf 


The following program plots these data on one set of axes with a different line style 
for each. 


Listing 7.3 The populations ol live US cities over time 


# eg7-populations .py 

import matplotlib.pyplot as plt 

import numpy as np 


fig = plt.figure() 

ax = fig.add_subplot(111) 
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Figure 7.4 Population trends for five US cities. 


cities = ['Boston', 'Houston', 'Detroit', 'San Jose', 'Phoenix'] 

# line styles: solid, dashes , dots, dash-dots, and dot-dot-dash 
linestyles = [{'ls': {'ls': {'ls': {'ls': 

{'dashes': [2, 4, 2, 4, 8, 4]}] 

for i, city in enumerate(cities): 

O filename = '{}. tsv'.format(city.lower ()). replace (' ', '_' ) 

yr, pop = np.loadtxt(filename, unpack=True) 

line, = ax.plot(yr, pop/l.e6, label=city, c='k', **linestyles[i]) 
ax.legend(loc='upper left') 
ax.set_xlim(1800, 2020) 
ax.set_xlabel('Year') 

ax.set_ylabel('Population (millions)') 
plt.show() 


O Note how the city name is used to deduce the corresponding filename. 
The plot produced is shown in Figure 7.4. 


Font properties 

The text elements of a plot (tities, legend, axis labeis, etc.) can be customized with the 
arguments given in Table 7.6. For example, 

ax.title('Plot Title', fontsize=18, fontname='Times New Roman', color='blue') 

To use the same font properties for all text elements, it is easiest to set Matplotlib’s 
rc settings using a dictionary of values. This involves a separate import from pyplot 
first: 3 


^ It is also possible to edit Matplotlib's configuration file, matplotlibrc, to set many kinds of plot 
preferences: see http://matplotlib.org/users/customizing.html. 
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Table 7.6 Font property arguments for text elements of a plot 


Argument Description 


fontsize 
fontname 
family 
fontweight 
fontstyle 
color 


The size of the font in points (e.g., 12, 16) 

The font name (e.g., 'Courier', 'Arial') 

The font family (e.g., ' sans-serif', 'cursive', 'monospace') 
The font weight (e.g., ' normal', ' bold') 

The font style (e.g., 'normal', 'italic') 

Any Matplotlib color specifier (e.g., ' r', '#ff00ff') 


from matplotlib import rc 

font_properties = {'family' : 'monospace', 

'weight' : 'bold', 

'size' : 22} 

O rc('font', **font_properties) 

# All text will now be rendered in 22-point, bold monospace in plots 

O Recall that the syntax **kwargs passes the (key, value) pairs of dictionary kwargs 
and passes them to a function as keyword arguments (see Section 4.2.2). 

Tick marks 

Matplotlib does its best to label representative values (tick marks ) on each axis appro- 
priately, but there are some occasions when you want to customize them, for example, 
to make the tick marks more or less frequent, or to label them differently. 

Most commonly, one simply wants to set the tick mark values to a given sequence of 
values: this is accomplished by calling ax. set_xticks and ax. set_yticks on the 
Axes object of the plot. For example, 

ax.set_xticks([0, 1, 3.5, 6.5, 15]) 

Note that the ticks do not have to be evenly spaced. 

To replace the actual numbered labeis, pass a sequence of strings of a suitable length 
to ax. set_xticklabels and ax. set_yticklabels, as in the following example. 4 


Example E7.4 The following program plots the exponential decay described by y = 
Ne~^ T labeled by lifetimes, («r for n = 0,1, • • •) such that after each lifetime the value 
of y falis by a factor of e. The plot is given as Figure 7.5. 

Listing 7.4 Exponential decay illustrated in terms of lifetimes 

# egi-ticks-exp-decay.py 
import numpy as np 

import matplotlib.pyplot as plt 

# Initial value of y at t=0, lifetime in s 
N, tau = 10000, 28 


4 Note that setting the tick labeis directly in this way decouples your plot from its data to some extent. An 
entire module, matplotlib.ticker, is devoted to the configuration of tick locating and formatting: its 
API is beyond the scope of this book but is well described at http://matplotlib.org/api/ticker_api.html. 
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Figure 7.5 An exponential decay with customized tick labeis. 


# Maximum time to consider (s) 
tmax = 100 

# A suitable grid of time points, and the exponential decay itself 
t = np.linspace(0, tmax, 1000) 

y = N * np.exp(-t/tau) 

fig = plt.figureO 

ax = fig.add_subplot(111) 

ax.plot(t, y) 

# The number of lifetimes that fall within the plotted time interval 
ntau = tmax // tau + 1 

# xticks at 0, tau, 2*tau, ..., ntau*tau; yticks at the corresponding y-values 
xticks = [i*tau for i in range(ntau)] 

yticks = [N * np.exp(-i) for i in range(ntau)] 
ax.set_xticks(xticks) 
ax.set_yticks(yticks) 

# xtick labeis: 0, tau, 2tau, ... 

O xtick_labels = [r , $0$ , / r'$\tau$'] + [r'${}\tau $' . format(k) for k in range(2,ntau)] 
ax.set_xticklabels(xtick_labels) 

# corresponding ytick labeis: N, N/e, N/2e, . . . 

0 ytick_labels = [r'$N$',r'$N/e$'] + [r'$N/{}e$format(k) for k in range(2,ntau)] 
ax.set_yticklabels(ytick_labels) 

ax.set_xlabel(r ' $t\;/\mathrm{s }$') 
ax.set_ylabel(r'$y $') 
ax.grid() 
plt.show() 


O The x-axis tick labeis are 0, r, 2r,... 

0 The y-axis tick labeis are N, N/e, N/2e,... 

Note that the length of the sequence of tick labeis must correspond to that of the list 
of tick values required. 
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Table 7.7 Common arguments to ax. tick_params 


Argument Description 


axis 

which 

direction 

length 

width 

pad 

labelsize 

color 

labelcolor 


Which axis to customize: ' x', ' y', or ' both'. Default is ' both'. 
Which tick mark set to customize: 'major', 'minor', or 'both'. 
Default is ' ma j or'. 

Tick mark direction: ' in', ' out', or ' inout'. Default is ' in'. 
Length of the tick marks in points. 

Width of the tick marks in points. 

Distance between the tick mark and its label in points. 

Tick label size in points. 

Tick mark color (a Matplotlib specifier). 

Tick mark label color (a Matplotlib specifier). 


To remove the tick labeis altogether set them to the empty list, for example 

ax. set_yticklabels ( [] ) 

This retains the tick marks themselves. If you want neither tick marks nor tick labeis on 
the axis use: 

ax. set_yticks ( [] ) 

There are two kinds of ticks: major ticks and minor ticks. Only major ticks are turned 
on by default; the smaller and more frequent minor ticks can most easily be enabled with 

ax.minorticks_on() 

More advanced customization of tick marks and their labeis, including showing minor 
tick marks for one axis only, can be achieved using the ax. tick_params convenience 
function, which takes the arguments described in Table 7.7. 

Finally, ax.xaxis and ax.yaxis have a method, set_ticks_position, which 
takes a single argument used to determine where the ticks appear: for ax.xaxis, 
' top', ' bottom', ' both' (the default) or ' none'; for ax. yaxis, ' lef t', ' right', 
' both' (the default) or ' none'. 


Example E7.5 The following program creates a plot with both major and minor tick 
marks, customized to be thicker and wider than the default, where the major tick marks 
point into and out of the plot area. 

Listing 7.5 Customized tick marks 

# eg7-tick-customization.py 

import numpy as np 

import matplotlib.pyplot as plt 

# A selection of functions on rn abeissa points for 0 <= x < 1 
rn = 100 

rx = np.linspace(0, 1, rn, endpoint=False) 
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def tophat(rx): 

""" Top hat function: y = 1 for x < 0.5, y=0 for x >= 0.5 """ 
ry = np.ones(rn) 
ry[rx>=0.5]=0 

return ry 

# A dictionary of functions to choose from 
ry = {' half-sawtooth' : lambda rx: rx.copyO, 

'top-hat': tophat, 

'sawtooth': lambda rx: 2 * np.abs(rx-0.5)} 

# Repeat the chosen function nrep times 
nrep = 4 

x = np.linspace(0, nrep, nrep*rn, endpoint=False) 

O y = np.tile(ry['top-hat'](rx), nrep) 

fig = plt.figureO 

ax = fig.add_subplot(111) 

ax.plot(x,y, 'k', lw=2) 

# Add a bit of padding around the plotted line to aid visualization 
ax.set_ylim(-0.1,1.1) 

ax. set_xlim (x [0]-0.5, x[-l]+0.5) 

# Customize the tick marks and tum the grid on 
ax.minorticks_on() 

ax.tick_params(which='major', length=10, width=2, direction='inout') 
ax.tick_params(which='minor', length=5, width=2, direction='in') 
ax.grid(which='both') 
plt.show() 


O This np. tile method constructs an array by repeating a given array nrep times. To 
plot a different periodic function, choose ' half -sawtooth' or 'sawtooth' here. 

The resulting plot is shown in Figure 7.6. 



Figure 7.6 A periodic function plotted on a graph with gridlines and customized tick marks. 
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Table 7.8 Common arguments to ax. errorbar 


Argument Description 


x, y 

yerr, xerr 
fmt 

ecolor 

elinewidth 

capsize 

errorevery 


The data to plot 

Errors on the x and y data coordinates, as described in the text 

The plot format Symbol (marker for the data point); set to None or the 

empty string, '' to display only the error bars 

A Matplotlib color specifier for the error bars; the default, None, uses 

the same color as the connecting line between data markers 

The width of the error bar lines in points; use None to use the same 

linewidth as the plotted data 

The length of the error bar caps, in points 

A positive integer giving the subsampling for the error bars; for example, 
errorevery=10 draws error bars on every lOth data point only 


Error bars 

To produce a plotted line with error bars, use the method plt. errorbars instead of 
plt. plot. In addition to the usual arguments of the plot function, errorbars allows 
the specification of errors in the x— and y— coordinates by passing the following types 
of value to the arguments xerr and yerr: 

• None: No error bars for this coordinate; 

• A scalar (e.g., xerr=o . 2): ali values are associated with symmetric error bars at 
plus and minus this value (i.e., ±0.2); 

• An array-type object of length n or shape (n, 1) (e.g., yerr= [0 . l, 0.15, 
0.1]): the symmetric error bars are plotted at plus and minus the values in this 
sequence for each of the n data points (i.e., ±0.1, ±0.15, ±0.1); 

• An array-type object of shape (2, n) (i.e., two rows for each of n data points): 
error bars, which may be asymmetric, are plotted using minus-values from the 
first row and plus-values from the second. 

The appearance of the error bars may be customized using the arguments summarized 
in Table 7.8. For example, 


# Some data 

x = array([ 0.3, 0.5, 0.7, 0.9]) 
y = array([ 1. , 2. , 2.5, 3.9]) 

# Constant , symmetric errors of +/- 0.05 on x-data 
xerr = 0.05 

# Asymmetric, variable errors on y-data 

yerr = array([[ 0.1, 0.25, 0.5, 0.4], 

[0.1, 0.15, 0.2, 0. ]]) 

ax.errorbar(x, y, yerr, xerr, fmt='o', ls='') 


Example E7.6 Before fledging, some species of birds lose weight relative to the 
surface area of their wings to maximize their aerodynamic efficiency. The file 
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f ledging-data. csv, available at scipython.com/eg/aad gives wing-loading values 
(body mass per wing area) as averages for two broods of swifts in the two weeks prior 
to fledging, with their uncertainties. 5 * 

In the program below, we perform a weighted fit to the data and plot it, with error 
bars. 


Listing 7.6 Wing loading variation in swifts prior to fledging 

# eg7-fledging.py 

import numpy as np 

import matplotlib.pyplot as plt 


# Read in the data: day before fledging, wing loading and error for two broods 
dt = np.dtype([( 7 day', '±2'), ( 7 wll 7 , 7 f8 7 ), ( 7 wll-err 7 , 7 f8 7 ), 

( 7 wl2 7 , 7 f8 7 ), ( 7 wl2-err 7 , 7 f8 7 )]) 

data = np.loadtxt( 7 f ledging-data.csv', dtype=dt, delimiter= 7 , 7 ) 


# Weighted fit of exponential decay to the data. This is a linear least squares 

# problem because y = Aexp (-Bx) => ln y = ln A - Bx = mx + c 

O pl_fit = np.polyld(np.polyfit(data['day'], np.log(data['wll 7 ]), 1, 
w=np.log(data[ 7 wll 7 ])**-2)) 

p2_fit = np.polyld(np.polyfit(data[ 7 day 7 ], np.log(data[ 7 wl2 7 ]), 1, 
w=np.log(data[ 7 wl2 7 ])**-2)) 
wllfit = np.exp(pl_fit(data[ 7 day 7 ])) 
wl2fit = np.exp(p2_fit(data[ 7 day 7 ])) 

# Plot the data points with their uncertainties and the fits 

fig = plt.figureO 

ax = fig.add_subplot(111) 

# wll data: white circles, black borders, with error bars 

ax.errorbar(data[ 7 day 7 ], data[ 7 wll 7 ], yerr=data[ 7 wll-err 7 ], ls = 77 , marker= 7 o 7 , 
color= 7 k 7 , mfc= 7 w 7 , mec= 7 k 7 ) 
ax.plot(data[ 7 day 7 ], wllfit, 7 k 7 , lw=1.5) 


# wl2 data: black filled circles, with error bars 

ax.errorbar(data[ 7 day 7 ], data[ 7 wl2 7 ], yerr=data[ 7 wl2-err 7 ], ls= 77 , marker= 7 o 7 , 
color= 7 k 7 , mfc= 7 k 7 , mec= 7 k 7 ) 
ax.plot(data[ 7 day 7 ], wl2fit, 7 k 7 , lw=1.5) 


ax.set_xlim(15,0) 

ax.set_ylim(0.003, 0.012) 

ax.set_xlabel( 7 days pre-fledging 7 ) 

ax.set_ylabel( 7 wing loading ($\mathrm{g\,mm A {-2}}$) 7 ) 
plt.show() 


O The data points are weighted in the fit by 1 /a 2 where a is the estimated one-Standard 
deviation error of the measurement. 

Figure 7.7 shows the results of the fit. The broods, initially with different average 
wing-loading values, are seen to converge prior to fledging. 


5 J. Wright et al„ Proc. R. Soc. B 273, 1895 (2006). 
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days prefledging 


Figure 7.7 Fitted time series for wing-loading values in two cohorts of swift nestlings. 

7.1.2 Bar charts and pie charts 

Bar charts and histograms 

The basic pyplot function for plotting a bar chart is ax.bar, which makes a plot of 
rectangular bars defined by their left edges and height. For example, 

ax.bar ( [0, 1, 2], [40, 80, 20]) 

The width of the rectangles is, by default, o . 8 but can be set with the (third) width 
argument. If you want the bars vertically centered, either set the argument align to 
' center ' or calculate where their left edges should be: 

w = 0.5 

x, y = np.array([0, 1, 2]), np.array([40, 80, 20]) 

ax.bar(x, y, w, align='center') # easiest way of centering the bars 
ax.bar(x - w/2, y, w) # or calculate the left edges 

Additional arguments, including the provision of error bars, are given in Table 7.9. 

By default, ax. bar produces a vertical bar chart. Horizontal bar charts are catered for 
either by setting orientation='horizontal' or by using the analogous ax.barh 
method. 


Example E7.7 The following program produces a bar chart of letter frequencies in the 
English language, estimated by analysis of the text of Moby-Dick. 6 The vertical bars are 
centered and labeled by letter (Figure 7.8). 

Listing 7.7 Letter frequencies in the text of Moby-Dick. 


# eg7-charfreq.py 

import numpy as np 

import matplotlib.pyplot as plt 


6 See, for example, www.gutenberg.org/ebooks/2701 for a free text file of this novel. 
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Table 7.9 Common arguments to ax. bar and barh 


Argument Description 


lef t 

height 

width 

bottom 

height 

color 

edgecolor 

linewidth 

xerr, yerr 

error_kw 

align 


log 

orientation 

hatch 


A sequence of .r-coordinates of the left edges of the bars (but see align) 

A sequence of heights for the bars 

Width of the bars. If a scalar, ali bars have the same width; can be array-like 
for variable widths 

The y-coordinates of the bottom of the bars 
A sequence of heights for the bars 
Colors of the bar faces (scalar or array-like) 

Colors of the bar edges (scalar or array-like) 

Line widths of the bar edges, in points (scalar or array-like) 

Error bar limits, as for errorbar (scalar or array-like) 

A dictionary of keyword arguments corresponding to customization the 
appearance of the errorbars (see Table 7.8) 

The default, ' edge', aligns the bars by their left edges (for vertical bars) or 
bottom edges (for horizontal bars); ' center' centers the bars on this axis 
instead 

Set to True to use a logarithmic axis scale 
'vertical' (the default) or 'horizontal' 

Set the hatching pattem for the bars: one of ' /' , ' \' , ' | ' , ' - ' , 

'+', 'x', ' o' , '0', '*'. Repeat the character for a denser 

pattern 


text_file = 'moby-dick.txt 7 

letters = 7 ABCDEFGHIJKLMNOPQRSTUVWXYZ 7 

# Initialize the dictionary of letter counts: {'A': 0, 'B': 0, -•■} 

lcount = dict([(l, 0) for 1 in letters]) 

# Read in the text and count the letter occurrences 
for 1 in open(text_file).read(): 

try: 

lcount[1.upper()] += 1 
except KeyError: 

# Ignore characters that are not letters 

pass 

# The total number of letters 
norm = sum(lcount.values()) 

fig = plt.figureO 

ax = fig.add_subplot(111) 

# The bar chart, with letters along the horizontal axis and the calculated 

# letter frequencies as percentages as the bar height 
x = range(26) 

ax.bar(x, [lcount[1]/norm * 100 for 1 in letters], width=0.8, 
color= 7 g 7 , alpha=0.5, align= 7 center 7 ) 
ax.set_xticks(x) 
ax.set_xticklabels(letters) 
ax.tick_params(axis= 7 x 7 , direction= 7 out 7 ) 
ax.set_xlim(-0.5, 25.5) 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:24, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.1017/CB09781139871754.007 





7.1 Matplotlib basies 


297 



ABCDEFGHIJKLMNOPQRSTUVWXYZ 


Figure 7.8 Letter frequencies in the novel Moby-Dick. 


ax.yaxis.grid(True) 

ax.set_ylabel('Letter frequency, %') 
plt.show() 


For monochrome plots, it is sometimes preferable to distinguish bars by pattems. The 
hatch argument can be used to do this, using any of several predefined patterns (see 
Table 7.9) as illustrated in the example below. 


file germany-energy-sources.txt, available at 
contains data on the renewable sources of electricity produced 
to 2013: 

Renewable electricity generation in Germany in GWh (million kWh) 


Year 

Hydro 

Wind 

Biomass 

Photovoltaics 

2013 

21200 

49800 

47800 

29300 

2012 

21793 

50670 

43350 

26380 

2011 

17671 

48883 

37603 

19559 


Example E7.8 The 

scipy thon. com/eg/aae 
in Germany from 1990 


The program below plots these data as a stacked bar chart, using Matplotlib’s hatch 
pattems to distinguish between the different sources (Figure 7.9). 

Listing 7.8 Visualizing renewable electricity generation in Germany 

# eg7-germany-ait-energy.py 

import numpy as np 

import matplotlib.pyplot as plt 

data = np.loadtxt('germany-energy-sources.txt ', skiprows=2, dtype='i4') 
years = data[:,0] 
n = len(years) 
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Renewable Electricity Generation in Germany, 1990-2013 
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Figure 7.9 Stacked bar chart of renewable energy generation in Germany, 1990-2013. 
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# GWh to TWh 
data[:,1:] /= 1000 

fig = plt.figureO 

ax = fig.add_subplot(111) 

sources = ('Hydroelectric', 'Wind', 'Biomass', 'Photovoltaics') 
hatch = ['o', '', 'xxxx', '**'] 
bottom = np.zeros(n) 
bars = [None]*n 

for i, source in enumerate(sources): 

O bars [i] = ax.bar(years, bottom=bottom, height=data[:,i+1 ], color='w', 

hatch=hatch[i], align='center') 
bottom += data[:,i+1] 

ax.set_xticks(years) 
plt.xticks(rotation=90) 
ax.set_xlim(1989, 2014) 

ax.set_ylabel('Renewable Electricity (TWh)') 

ax.set_title('Renewable Electricity Generation in Germany, 1990-2013') 

0 plt.legend(bars, sources, loc='best') 
plt.show() 


O To include a legend, each bar chart object 7 must be stored in a list, bars, which 
0 is passed to the ax. legend method with a corresponding sequence of labeis, 
sources. 


7 Actually a Container of artists. 
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Table 7.10 Common arguments to ax. pie 


Argument Description 


colors 

labeis 

explode 

shadow 

startangle 

autopet 

pctdistance 

labeldistance 

radius 


A sequence of Matplotlib color specifiers for coloring the segments 
A sequence of strings for labeling the segments 

A sequence of values specifying the fraction of the pie chart radius to 
offset each wedge by (0 for no explode effect) 

True or False: specifies whether to draw an attractive shadow under 
the pie 

Rotate the “start” of the pie chart by this number of degrees counter- 
clockwise frorn the horizontal axis 

A format string to label the segments by their percentage fractional value, 

or a function for generating such a string from the data 

The radial position of the autopet text, relative to the pie radius. The 

default is 0.6 (i.e., within the pie, which can be awkward for narrow 

segments) 

The radial position of the label text, relative to the pie radius; the 
default is 1.1 (just outside the pie) 

The radius of the pie (the default is 1); this is useful when creating 
overlapping pie charts with different radii 


Pie charts 

It is straightforward to draw a pie chart in Matplotlib by passing an array of values to 
ax. pie. The values will be normalized by their sum if this sum is greater than 1, or 
otherwise treated directly as fractions. Labeis, percentages, “exploded” segments and 
other effects are handled as described in Table 7.10 and illustrated in the following 
example. 


Example E7.9 The following program depicts the emissions of greenhouse gases by 
mass of “carbon equivalent” (data from the 2007 IPCC report). 8 

Listing 7.9 Pie chart of greenhouse gas emissions 

# eg7-pie.py 
import numpy as np 

import matplotlib.pyplot as plt 

# Annual greenhouse gas emissions, billion tons carbon equivalent (GtCe) 
gas_emissions = np.array([(r'$\mathrm{CO_2}$-d', 2.2), 

(r'$\mathrm{CO_2}$-f', 8.0), 

('Nitrous\nOxide' , 1.0), 

('Methane ', 2.3), 

('Halocarbons' , 0.1)], 

dtype=[('source', 'U17'), ('emission', 'f4')]) 


8 IPCC (2007), Climate Change 2007: Synthesis Report. Contribution of Working Groups 1, II and III to 
the Fourth Assessment Report of the Intergovernmental Panel on Climate Change [Core Writing Team, 
Pachauri, R. K and Reisinger, A. (eds.)]. Geneva, Switzerland: IPCC, 104 pp. 
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Figure 7.10 Greenhouse gas emissions by percentage for five different sources. CCG-d denotes 
CO 2 emissions frorn deforestation; C 02 -f denotes CO 2 emissions frorn fossil fuel burning. 


# 5 colors beige 

colors = ['#C7B299' , '#A67C52', '#C69C6E', '#754C24', '#534741'] 

O explode = [0, 0, 0.1, 0, 0] 
fig, ax = plt.subplots() 

ax.axis('equal') # So our pie looks round! 

ax.pie(gas_emissions['emission'], colors=colors, shadow=True, startangle=90, 
0 exp1ode=explode, labels=gas_emissions['source'], autopct='%.lf%%', 

pctdistance=l.15, labeldistance=l.3) 

plt.show() 


O The segment corresponding to nitrous oxide has been “exploded” by 10%. 

0 The percentage values are formatted to one decimal place (autopct= ' % . lf %%' )• 
The resulting pie chart is shown in Figure 7.10. 


7.1.3 Multiple subplots 

To create a figure with more than one subplot (that is, Axes), call add subplot on your 
Figure object, setting its argument to indicate where the subplot should be placed. Each 
call returns an Axes object. Single figures with more than 10 subplots are uncommon, 
so the usual argument is a three-digit number where each digit indicates the number of 
rows, number of columns and subplot number. The subplot number increases along the 
columns in each row and then down the rows. For example, a figure consisting of three 
rows of two columns of subplots can be constructed by adding Axes objects: 

In [x] : fig = plt.figureO 

In [x]: axi = fig.add_subplot(321) # top left subplot 
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In [x]: ax2 = fig.add_subplot(322) # top right subplot 

In [x]: ax3 = fig.add_subplot(323) # middle left subplot 

In [x]: ax6 = fig.add_subplot(326) # bottom right subplot 

Alternatively, to create a figure and add all its subplots to it at the same time, call 
plt. subplots, which takes arguments nrows and ncols (in addition to those listed 
in Table 7.1) and returns a Figure and an array of Axes objects, which can be indexed 
for each individual axis: 

In [x]: fig, axes = plt.subplots(nrows=3, ncols=2) 

In [x]: axes.shape 
Out [x] : (3 , 2) 

In [x]: axi = axes[0,0] # top left subplot 

In [x]: ax2 = axes[2,1] # bottom right subplot 

In fact, a useful idiom to create a plot with a single Axes object is to call subplots () 
with no arguments: 

In [x]: fig, ax = plt.subplots() 

In [x]: ax.plot(x,y) # no need to index the single Axes object created 


Plots with subplots run the risk of their labeis, tities and ticks overlapping each other - 
if this happens, call the method tight layout on the Figure object and Matplotlib 
will do its best to arrange them so that there is sufficient space between them. 


Example E7.10 Consider a metal bar of cross-sectional area, A, initially at a uniform 
temperature, 6q, which is heated instantaneously at the exact center by the addition of an 
amount of energy, H. The subsequent temperature of the bar (relative to 6q) as a function 
of time, t, and position, x, is govemed by the one-dimensional diffusion equation: 



where c p and D are the metal’s specific heat capacity per unit volume and thermal 
diffusivity (which we assume are constant with temperature). The following code plots 
d(x, t) for three specific times and compares the plots between two metals, with different 
thermal diffusivities but similar heat capacities, copper and iron. 

Listing 7.10 The one-dimensional diffusion equation applied to the temperature of two different metal 
bars 

# eg7-diffusionld.py 
import numpy as np 

import matplotlib.pyplot as plt 

# Cross-sectional area of bar in m3 , heat added at x=0 in J 
A, H = l.e-4, l.e3 

# Temperature in K at t=0 
thetaO = 300 

# Metal element Symbol, specific heat capacities per unit volume (J.m-3.K-1), 

# Thermal diffusivities (m2.s-l) for Cu and Fe 
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metals = np.array([( 7 Cu 7 , 3.45e7, l.lle-4), ( 7 Fe 7 , 3.50e7, 2.3e-5)], 

dtype=[( 7 symbol 7 , 7 |S2 7 ), ( 7 cp 7 , 7 f8 7 ), ( 7 D 7 , 7 f8 7 )]) 

# The metal bar extends from -xlim to xlim (m) 
xlim, nx = 0.05, 1000 

x = np.linspace(-xlim, xlim, nx) 

# Calculate the temperature distribution at these three times 
times = (le-2, 0.1, 1) 

# Create our subplots: three rows of times, one column for each metal 
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(7, 8)) 

for j, t in enumerate(times): 

for i, metal in enumerate(metals): 
symbol, cp, D = metal 
ax = axes[j, i] 

# The solution to the diffusion equation 

theta = thetaO + H/cp/A/np.sqrt(D*t) / 4/np.pi * np.exp(-x**2/4/D/t) 

# Piot, converting distances to cm and add some labeling 
ax.plot(x*100, theta, 7 k 7 ) 

ax.set_title( 7 {}, $t={}$ s 7 .format(symbol.decode( 7 utf8 7 ), t)) 
ax.set_xlim(-4, 4) 

ax.set_xlabel( 7 $x\;/\mathrm{cm}$ 7 ) 
ax.set_ylabel( 7 $\Theta\;/\mathrm{K}$ 7 ) 

# Set up the y axis so that each metal has the same scale at the same t 
for j in (0,1,2): 

ymax = max(axes[j,0].get_ylim()[1], axes[j,1].get_ylim()[1]) 
print(axes[j,0].get_ylim(), axes[j,1].get_ylim()) 
for i in (0,1): 

ax = axes[j,i] 

ax.set_ylim(thetaO, ymax) 

# Ensure there are only three y-tick marks 

ax.set_yticks([thetaO, (ymax + thetaO)/2, ymax]) 

# We don't want the subplots to bash into each other: tight_layout() fixes this 
fig.tight_layout() 

plt.show() 

Because copper is a better conductor, the temperature increase is seen to spread more 
rapidly for this metal (see Figure 7.11). 


To further customize the subplot spacing, call fig. subplots_adjust (). This 
method takes any of the keywords lef t, bottom, right, top, wspace and hspace, 
which can be set to fractional values of the figure’s height and width as appropriate 
to determine the positions of the subplots’ left side (default 0.125), right side (0.9), 
bottom (0.1), top (0.9), vertical spacing (0.2) and horizontal spacing (0.2). A practical 
use of this function is to create “ganged” subplots that share a common axis, as in the 
following example. 


Example E7.11 This code generates a figure of 10 subplots depicting the graph of 
sin (rmx) for n = 0,1, • ■ ■ 9. The subplot spacing is configured so that they “run into” 
each other vertically (see Figure 7.12). 
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Figure 7.11 Numerical Solutions of the one-dimensional diffusion equation for the temperatures 
of two metal bars. 


Listing 7.11 Ten subplots with zero vertical spacing 

import numpy as np 

import matplotlib.pyplot as plt 

nrows = 10 

fig, axes = plt.subplots(nrows,1) 

# Zero vertical space between subplots 
fig.subplots_adjust(hspace=0) 

x = np.linspace(0,1,1000) 
for i in range(nrows): 

# n=nrows for the top subplot, n=0 for the bottom subplot 
n = nrows - i 

axes[i].plot(x, np.sin(n * np.pi * x), 'k', lw=2) 

# We only want ticks on the bottom of each subplot 
axes[i].xaxis.set_ticks_position('bottom') 

if i < nrows-1: 

# Set ticks at the nodes (zeros) of our sine functions 
axes[i].set_xticks(np.arange(0, 1, 1/n)) 

# We only want labeis on the bottom subplot xaxis 
axes[i].set_xticklabels('') 

axes[i].set_yticklabels('') 
plt.show() 
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Figure 7.12 Ten subplots of sin(n7r.r) for n = 0,1, • • • 9 adjusted to remove vertical space between 
them. 


7.1.4 Annotations 

Matplotlib provides several ways to add different kinds of annotation to your plots. The 
most important methods for adding text, arrows, lines and shapes are described below. 

Adding text 

The rnethod ax. text (x, y, s) is a basic method used to add a text string s at position 
(x,y) (in data coordinates) to the axes. The font properties can be determined by 
additionally passing adictionary of (keyword, value) pairs to fontdict (see Table 7.6). 
Individual keyword arguments (such as fontsize=2 0) can also be used to customize 
the font in this way. 

If the text annotation refers to a feature of the data, you will usually want the default 
behavior, placing it using data coordinates so that it maintains the same relative position 
to the data even if the plot limits are changed. If, instead, you want to place the text 
in axis coordinates (such that (0,0) is the lower left of the axes and (1,1) is the upper 
right), set the keyword argument transform=ax. transAxes where ax is the Axes 
object the coordinates refer to. 

Arrows and text 

The ax.annotate method is similar to ax.text (although with an annoyingly dif¬ 
ferent syntax) but draws an arrow from the text to a specified point in the plot. The 
important arguments to ax. annotate are 

• s, the string to output as a text label; 

• xy, a tuple, (x, y) giving the coordinates of the position to annotate (i.e., where 
the arrow points to ); 

• xytext, a tuple, (x,y) giving the coordinates of the text label (i.e., where the 
arrow points/rom); 
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• xycoords, an optional string determining the type of coordinates referred to by 
the argument xy: several options are available, 9 but the most commonly used 
ones are 

- ' data': data coordinates, the default, 

- 'figure fraction' : fractional coordinates of the figure size ((0,0) is 
lower left, (l, l) is upper right), 

- 'axes fraction': fractional coordinates of the axes ((0,0) is lower 
left, (1,1) is upper right), and 

• texteoords: as for xycoords, an optional string determining the type of coor¬ 
dinates referred to by xytext. An additional value is permitted for this string: 
'offset points ' specifies that the tuple xytext is an offset in points from 
the xy position. 

• arrowprops: if present, determines the properties and style of the arrow drawn 
between xytext and xy (see below). 

Additional keyword arguments are interpreted as properties of the Text object 
produced as the label (e.g., fontsize and color). An important pair is 
verticalalignment (or va) and horizontalalignment (or ha) which determine 
how the label is aligned relative to its xytext position. Valid values are ' center', 
'right', 'left', 'top', 'bottom' and 'baseline' as appropriate. 

In its simplest usage, ax. annotate just adds a text label to the plot (without an 
arrow). For example, 

ax.annotate('My Label', xy=(0.5,0.8), fontsize=16, xycoords='axes fraction', 
ha='center') 

which adds 'My Label' at the center, near the top of the axes in 16-point text. Note 
that if there is no arrow or line, xytext is not necessary and the label is placed directly 
at xy. 

The argument arrowprops is a dictionary determining the style of line or arrow 
joining the label at xytext to the specified xy point. There is a somewhat bewildering 
array of possible items to put in this dictionary, but the important ones are illustrated by 
the following example. 


Example E7.12 The following program produces a plot with eight arrows with different 
styles (Figure 7.13). 

Listing 7.12 Annotations with arrows in Matplotlib 


# eg7 -arrows .py 

import numpy as np 

import matplotlib.pyplot as plt 

fig, ax = plt.subplots() 
x = np.linspace(0,1) 
ax.plot(x, x, 'o') 


9 See the documentation at http://matplotlib.org/api/text_api.html/matplotlib.text.Annotation. 
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Figure 7.13 An example of different arrow styles. 


ax.annotate('default line', xy=(0.15,0.1), xytext=(0.6,0.1) , 
arrowprops={'arrowstyle': '-'}, va='center') 
ax.annotate('dashed line', xy=(0.25,0.2), xytext=(0.6,0.2), 

arrowprops={'arrowstyle': 'ls': 'dashed'}, va='center') 

ax.annotate('default arrow', xy=(0.35,0.3), xytext=(0.6,0.3) , 
arrowprops={'arrowstyle': '->'}, va='center') 
ax.annotate('thick blue arrow', xy=(0.45,0.4), xytext=(0.6,0.4) , 

arrowprops={'arrowstyle': ', 'lw': 4, 'color': 'blue'}, 
va='center') 

ax.annotate('double-headed arrow', xy=(0.45,0.5), xytext=(0.01,0.5), 
arrowprops={'arrowstyle': '<->'}, va='center') 
ax.annotate('arrow with closed head', xy=(0.55,0.6), xytext=(0.1,0.6), 
arrowprops={'arrowstyle': '—|>'}, va='center') 
ax.annotate('a really thick red arrow\nwith not much space', xy=(0.65,0.7), 
xytext=(0.1,0.7), va='center', multialignment='right', 
arrowprops={'arrowstyle': '—|>', 'lw': 8, 'ec': 'r'}) 
ax.annotate('a really thick red arrow\nwith space between\nthe tail and the' 
'label', xy=(0.85,0.9), xytext=(0.1,0.9), va='center', 
multialignment='right', 

arrowprops={'arrowstyle': '—|>', 'lw': 8, 'ec': 'r', 'shrinkA': 10}) 

plt.show() 


Example E7.13 Another example of an annotated plot, this time of the share price of 
BP ple (LSE: BP) with a couple of notable events added to it. The necessary data for 
this example can be downloaded from Yahoo! Finance. 10 


10 https://uk.finance.yahoo.com/q/hp?s=BP.L. 
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Listing 7.13 eg7-share-prices 

import datetime 

import numpy as np 

import matplotlib.pyplot as plt 

from matplotlib.dates import strpdate2num 

from datetime import datetime 

O def date_to_int(s): 

epoch = datetime(year=1970, month=l, day=l) 
date = datetime.strptime(s, '%Y-%m-%d') 

return (date - epoch).days 

def bindate_to_int(bs): 

return date_to_int(bs.decode('ascii')) 

dt = np.dtype([('daynum','i8'), ('close', 'f8')]) 

share_price = np.loadtxt('bp-share-prices.csv', skiprows=l, delimiter=',', 
usecols= (0,4), converters={0: bindate_to_int}, 
dtype=dt) 

fig = plt.figureO 

ax = fig.add_subplot(111) 

ax.plot(share_price['daynum'], share_price['close'], c='g') 

© ax.fill between(share price[ 7 daynum'], 0, share_price['close'], facecolor='g', 
alpha=0.5) 

daymin, daymax = share_price['daynum'].min(), share_price['daynum'].max() 
ax.set_xlim(daymin, daymax) 

price_max = share_price['close'].max() 

def get_xy(date): 

""" Return the (x,y) coordinates of the share price on a given date. """ 
x = date_to_int(date) 

return share_price[np.where(share_price['daynum']==x)][0] 

# A horizontal arrow and label 
x,y = get_xy('1999-10-01') 

ax.annotate('Share split', (x,y), xytext = (x+1000,y), va='center', 

arrowprops=dict(facecolor='black', shrink=0.05)) 

# A vertical arrow and label 
x,y = get_xy('2010-04-20') 

ax.annotate('Deepwater Horizon\noil spill', (x,y), xytext = (x,price_max*0.9), 

arrowprops=dict(facecolor='black', shrink=0.05), ha='center') 

years = range(1989,2015,2) 

ax.set_xticks([date_to_int('{:4d}-01-01'.format(year)) for year in years]) 

© ax.set_xticklabels(years, rotation=90) 

plt.show() 


O We need to do some work to read in the date column: first decode the byte string read 
in from the file to ASCII (bindate_to_int), then use datetime (see Section 4.5.3) 
to convert it into an integer number of days since some reference date (epoch): here we 
choose the Unix epoch, 1 January 1970 (date_to_int). 

© ax. fili between filis the region below the plotted line with a single color. 
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Figure 7.14 BP plc’s share price on an annotated chart. 

© We rotate the year labeis so there’s enough room for them (reading bottom to top). 
Figure 7.14 shows the plotted chart. 


Lines and span rectangles 

Adding an arbitrary straight line to a Matplotlib plot can be achieved by simply plotting 
the data corresponding to its start and end points with ax. plot; for example, 

ax.plot([xl / x2], [yl, y2], color='k', lw=2) 

draws a line between (xl, yl) and (x2, y2). Of course, this approach would be 
tedious for a drawing a large number of disconnected lines, so for horizontal and 
vertical lines there are a pair of convenient methods, ax.hlines and ax.vlines. 
ax. hlines takes mandatory arguments y, xmin, xmax and draws horizontal lines with 
y-coordi nates at each of the values given by the sequence y (if y is passed as a scalar, 
a single line is drawn). xmin and xmax specify the start and end of each line; they can 
be scalars (in which case ali the lines will have the same start and end x-coordinates) or 
a sequence (with one value for each of the y-coordi nates specified by y). ax.vlines 
draws vertical lines; its mandatory arguments, x, ymin and ymax are entirely analogous. 


Example E7.14 The code below illustrates some different uses of ax.vlines and 
ax.hlines (see Figure 7.15). 

Listing 7.14 Some different waysto use ax.vlines and ax.hlines 

# egi-circle-lines.py 

import numpy as np 

import matplotlib.pyplot as plt 

fig, ax = plt.subplots() 
ax.axis('equal ') 
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Figure 7.15 A figure generated from vertical and horizontal lines. 

# A circle made of horizontal lines 
y = np.linspace(-1,1,100) 

xmax = np.sqrt(l - y**2) 
ax.hlines(y / -xmax, xmax, color='g') 

# Draw a box of thicker lines around the circle 
ax.vlines(-1, -1, 1, lw=2, color='r') 
ax.vlines(1, -1, 1, lw=2, color='r') 

ax.hlines(-1, -1, 1, lw=2, color='r') 
ax.hlines(l, -1, 1, lw=2, color='r') 

# Some evenly spaced vertical lines 
ax.vlines(y[::10], -1, 1, color='b') 

# Remove tick marks and labeis 
ax.xaxis.set_visible(False) 
ax.yaxis.set_visible(False) 

# A bit of padding around the outside of the box 
ax.set_xlim(-l.1,1.1) 

ax.set_ylim(-l.1,1.1) 

plt.show() 


On static plots such as figures for printing, ax.hlines and ax.vlines work well, 
but note that the line limits don’t change upon changing the axes’ limits in an interactive 
plot. There are two further methods, ax. axhline and ax. axvline, which simply plot 
a horizontal or vertical line across the axis, whatever its current limits. axhline takes 
arguments y, xmin, xmax, but these must be scalar values (so multiple lines require 
repeated calls) and xmin, xmax are given in fractional coordinates such that 0 is the far 
left of the plot and 1 the far right. Again, the axvline arguments: x, ymin, ymax are 
analogous. Some examples: 

ax.axhline(100, 0, 1) # Horizontal line across whole of x-axis at y = 100. 

ax.axhline(100) # Same thing: xmin and xmax default to 0 and 1 
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Table 7.11 Keyword arguments for styling patches 


Argument 

Description 

alpha 

Set the alpha transparency (0-1) 

color 

Set both the facecolor and the edgecolor of the patch 

edgecolor, ec 

Set the edge (border) color 

facecolor, fc 

Set the patch face color 

fili 

Indicate whether to fili the patch or not (True or False) 

hatch 

Set the hatching pattem for the patch: one of ' /' , ' \' , ' | ' , 
'x', 'o', '0', Repeat the character for 

a denser pattem 

linestyle, ls 

Set the patch line style: 'solid', 'dashed', 'dashdot', 'dotted' 

linewidth, lw 

Set the patch line width, in points 


# A thick, blue, dashed vertical line at x = 5. around the center of the y-axis 
ax.axvline(5, 0.4, 0.6, c='b', lw=4, ls='--') 

The methods ax. axhspan and ax. axvspan are similar but produce a horizontal or 
vertical spanning rectangle across the axis. ax.axhspan is passed arguments ymin, 
ymax (in data coordinates), and xmin, xmax (in fractional axes units). ax.axvspan 
takes the arguments xmin, xmax, ymin and ymax analogously. Extra keywords can be 
used to style the spanning rectangle (see Table 7.11). 


Example E7.15 The program below annotates a simple wave plot to indicate the dif¬ 
ferent regions of the electromagnetic spectrum, using text, axvline, axhline and 
axvspan (see Figure 7.16). 

Listing 7.15 A representation of the electromagnetic spectrum, 250-1,000 nm 

# eg7-annotate .py 
import numpy as np 

import matplotlib.pyplot as plt 

# wavelength range, nm 
lmin, lmax = 250, 1000 

x = np.linspace(lmin, lmax, 1000) 

# A wave with a smoothly increasing wavelength 

wv = (np.sin(10 * np.pi * x / (lmax+lmin-x))) [::-1] 

fig = plt.figureO 

ax = fig.add_subplot(111, axisbg='k') 
ax.plot(x, wv, c='w', lw=2) 
ax.set_xlim(250,1000) 
ax.set_ylim(-2,2) 

# Label and delimit the different regions of the electromagnetic spectrum 
ax.text(310, 1.5, 'UV', color='w', fontdict={'fontsize': 20}) 
ax.text(530, 1.5, 'Visible', color= / k / , fontdict={'fontsize': 20}) 

ax.annotate('', (400, 1.3), (750, 1.3), arrowprops={'arrowstyle': '<|-|> , / 

'color': 'w', 'lw': 2}) 

ax.text(860, 1.5, 'IR', color='w', fontdict={'fontsize': 20}) 

ax.axvline(400, -2, 2, c='w', ls='--') 
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300 400 500 600 700 800 900 1000 

A /nm 

Figure 7.16 A representation of the electromagnetic spectrum. 


ax.axvline(750, -2, 2, c='w', ls='--') 

# Horizontal "axis" aeross the center of the wave 
ax.axhline(c='w 7 ) 

# Ditch the y-axis ticks and labeis; label the x-axis 
ax.yaxis.set_visible(False) 

ax.set_xlabel(r 7 $\lambda\;/\mathrm{nm}$') 

# Finally, add some colorful rectangles representing a rainbow in the 

# visible region of the spectrum. 

# Dictionary mapping of wavelength regions (nm) to approximate RGB values 

rainbow_rgb = { (400, 440): 7 #8b00ff 7 , (440, 460): 7 #4b0082 7 , 

(460, 500): '#0000ff', (500, 570): '#00ff00 # , 

(570, 590): 7 #ffff00 7 , (590, 620): 7 #ff7f00 7 , 

(620, 750) : 7 #ff0000 7 } 
for wv_range, rgb in rainbow_rgb.items(): 

ax.axvspan(*wv_range, color=rgb, ec= 7 none 7 , alpha=l) 
plt.show() 


7.1.5 0 Circles, ellipses, rectangles and other patehes 

Almost everything that gets rendered in a Matplotlib figure is a subclass of the abstract 
base class, Artist. This includes lines (through Line2D) and text (through Text). 11 
An important collection of rendered objects is further derived frorn the Artist subclass 
Patch: a two-dimensional shape. The wedges of a pie chart (Section 7.1.2) and the 
arrows of an annotation (Section 7.1.4) are examples we have met before. 


11 In fact, there are two kinds of Artist: primitives and containers. Primitives are the graphical objects 
(such as Line2D themselves) and containers are the elements of a figure onto which they are rendered (for 
example Axes). 
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To add a shape to an Axes object, create a patch using one of the classes described 
in full in the Matplotlib documentation 12 and call ax. add_ arti st (patch) . To set 
the color, line widths, transparency, etc. of the patch, pass one or more of the keywords 
listed in Table 7.11 when creating the patch. 

Below we describe this usage for a few Patch objects. 

Circles and Ellipses 

A Circle centered at xy = (x,y) (in data coordinates) and with radius r is created 
with: 

from matplotlib.patches import Circle 
circle = Circle(xy, r, **kwargs) 

It is added to the Axes with ax. add artist: 

ax.add_artist(circle) 

The supported keyword arguments indicated by **kwargs are the usual patch styling 
ones, summarized in Table 7.11. 

Ellipse patches are similar but take arguments width and height (the total length 
of the horizontal and vertical axes of the ellipse before rotation) and angi e (the angle 
of counterclockwise rotation of the ellipse in degrees). 

from matplotlib.patches import Ellipse 

ellipse = Ellipse(xy, width, height, angle, **kwargs) 


Example E7.16 The following code reads in the heights and masses of 260 women and 
247 men from the data set published by Heinz et ai 13 and available for download at 
www.amstat.org/publications/jse/datasets/body.dat.txt. It plots the (height, mass) pairs 
for each individual on a scatterplot and, for each sex, draws a 3 a covariance ellipse 
around the mean point. The dimensions of this ellipse are given by the (scaled) eigen- 
values of the covariance matrix and it is rotated such that its semi-major axis lies along 
the largest eigenvector. 

Listing 7.16 An analysis of the height-mass relationship in 507 healthy individuals 

# eg7-body-mass-height.py 

import numpy as np 

import matplotlib.pyplot as plt 

from matplotlib.patches import Ellipse 


FEMALE, MALE = 0, 1 

dt = np.dtype([('mass', 'f8'), ('height', 'f8'), ('gender', 'i2')]) 
data = np.loadtxt('body.dat.txt', usecols=(22,23,24), dtype=dt) 

fig, ax = plt.subplots() 


12 http://matplotlib.org/api/artist_api.html. 

13 G. Heinz et al., Journal of Statistical Education 11 (2), 2003. This article is available at www.amstat.org/ 
publications/j se/v 11 n2/datasets .heinz .html. 
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def get_cov_ellipse(cov, center, nstd, **kwargs): 

Return a matplotlib Ellipse pateh representing the covariance matrix 
cov centered at center and scaled by the factor nstd. 


# Find and sort eigenvalues and eigenvectors into descending order 
eigvals, eigvecs = np.linalg.eigh(cov) 

order = eigvals.argsort() [::-1] 

eigvals, eigvecs = eigvals [order], eigvecs [:, order] 

# The counterclockwise angle to rotate our ellipse by 
vx, vy = eigvecs[:,0][0], eigvecs[:,0][1] 

O theta = np.arctan2(vy, vx) 

# Width and height of ellipse to draw 
width, height = 2 * nstd * np.sqrt(eigvals) 

return Ellipse(xy=center, width=width, height=height, 
angle=np.degrees(theta), **kwargs) 

labeis, colors = ['Female', 'Male']/ ['magenta', 'blue'] 
for gender in (FEMALE, MALE): 

sdata = data[data['gender']==gender] 
height_mean = np.mean(sdata['height']) 
mass_mean = np.mean(sdata['mass']) 
cov = np.cov(sdata['mass'], sdata['height']) 

ax.scatter(sdata['height'], sdata['mass'], color=colors[gender], 
label=labels[gender]) 

e = get_cov_ellipse(cov, (height_mean, mass_mean), 3, 
fc=colors[gender], alpha=0.4) 

ax.add_artist(e) 

ax.set_xlim(140, 210) 
ax.set_ylim(30, 120) 
ax.set_xlabel('Height /cm') 
ax.set_ylabel('Mass /kg') 

ax.legend(loc='upper left', scatterpoints=l) 
plt.show() 


O The function np. arctan2 returns the “two-argument arctangent”: np. arctan2 
(y, x) is the angle in radians between the positive x-axis and the point (x, y). 

Figure 7.17 shows the resulting plot. 


Rectangles 

Rectangle patehes are created in a similar way to Ellipses: 

from matplotlib.patehes import Rectangle 

rectangle = Rectangle(xy, width, height, angle, **kwargs) 


Flere, however, the tuple xy= (x, y) gives the coordinates of the lower-left corner of the 
rectangle. A square is simply a rectangle with the same width and height, of course. 
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Figure 7.17 Scatterplots for each gender of mass and height for a total of 507 students, with their 
covariance ellipses annotated. 


Polygons 

A Polygon patch is created by passing an array of shape (n, 2 ), in which each row 
represents the {x,y) coordinates of a vertex. If the additional argument, closed is True 
(the default), the polygon will be closed so that the start and end points are the same. 
This is illustrated in the following example. 


Example E7.17 This code produces an image (Figure 7.18) of some colorful shapes. 
Listing 7.17 Some colorful shapes 

import numpy as np 

import matplotlib.pyplot as plt 

from matplotlib.patches import Polygon, Circle, Rectangle 

red, blue, yellow, green = '#ff0000', '#0000ff', '#ffff00', '#00ff00' 

square = Rectangle((0.7, 0.1), 0.25, 0.25, facecolor=red) 
circle = Circle((0.8, 0.8), 0.15, facecolor=blue) 

triangle = Polygon(((0.05,0.1), (0.396,0.1), (0.223, 0.38)), fc=yellow) 

rhombus = Polygon(((0.5,0.2), (0.7,0.525), (0.5,0.85), (0.3,0.525)), fc=green) 

fig = plt.figureO 

ax = fig.add_subplot(111, axisbg='k', aspect='equal') 
for shape in (square, circle, triangle, rhombus): 

ax.add_artist(shape) 
ax.xaxis.set_visible(False) 
ax.yaxis.set_visible(False) 

plt.show() 
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Figure 7.18 Some colorful shapes using Matplotlib patehes. 


Questions 

Q7.1.1 Compare plots of y = x 3 for —10 < x < 10 using a logarithmic scale on the 
x-axis, y-axis and both axes. What is the difference between using ax. set_xscale 
('log' ) and ax.set_xscale('symlog')? 

Q7.1.2 Adapt Example E7.7 to produce a horizontal bar chart, with the bars in order 
of decreasing letter frequency (i.e., with the most common letter, E, at the bottom). 


Problems 


P7.1.1 The EconomisTs Big Mac Index is a lighthearted measure of purchasing power 
parity between two currencies. Its premise is that the difference between the price of a 
McDonald’s Big Mac hamburger in one currency (converted into US dollars (USD) at 
the prevailing exchange rate) and its price in the United States is a measure of the extent 
to which that currency is over- or under-valued (relative to the dollar). 


The files at 


scipython.com/ex/aga provide the historical Big Mac prices and 


exchange rates for four currencies. For each currency, calculate the percentage over- or 
under-valuation of each currency as 


(local price converted to USD — US price) 
(US price) 


x 100 


and plot it as a function of time. 

P7.1.2 Plot, as a histogram, the data in the table below conceming the number of cases 
of West Nile virus disease in the United States between 1999 and 2008. The two types 
of disease, neuroinvasive and non-neuroinvasive, should be plotted as separate bars on 
the same chart for each year. 
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Year 

Neuroinvasive cases 

Non-neuroinvasive cases 

1999 

59 

3 

2000 

19 

2 

2001 

64 

2 

2002 

2,946 

1,210 

2003 

2,866 

6,996 

2004 

1,148 

1,391 

2005 

1,309 

1,691 

2006 

1,495 

2,774 

2007 

1,227 

117 

2008 

689 

667 


P7.1.3 A bubble chart is a type of scatterplot that can depict three dimensions of data 
through the position (x- and y-coordinates) and size of the marker. The plt. scatter 
method can produce bubble charts by passing the marker size to its s attribute (in 
(points) 2 such that the area of the marker is proportional to the magnitude of the third 
dimension). 

The files gdp.tsv, bmi_men.tsv and population_total. tsv, available at 
scipython.com/ex/agc , contain the following data from 2007 for each country: the 
GDP per person per capita in international dollars fixed at 2005 prices, the body mass 
index (BMI) of men (in kg/m 2 ) and the total population. Generate a bubble chart of 
BMI against GDP, in which the population is depicted by the size of the bubble markers. 
Beware: some data are missing for some countries. 

Bonus exercise: color the bubbles by continent using the list provided in the file 
continents.tsv. 


P7.1.4 The US National Oceanic and Atmospheric Administration (NOAA) makes a 
data set of atmospheric carbon dioxide concentrations since 1958 freely available to 
the public at ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt. Using this 
data, plot the “interpolated” and “trend” CO 2 concentration against time on the same 
graph. 

P7.1.5 Write a program to plot the Planck function, 5(A), for the spectral radiance of a 
Black body at temperature T as a function of wavelength, A for the Sun (T = 5778 K): 

2 hc 2 1 

B(k) = —= - 

A 5 exp (hc/kk^T) — 1 

Use a NumPy array to store values of B( A) from 100 to 5,000 nm, but set the wavelength 
range to decrease from 4,000 nm to 0. 

P7.1.6 Reproduce Figure 7.19 using Circle patches. 
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7.2 Contour plots, heatmaps and 3D plots 

Until now, we have looked only at plotting one-dimensional data (that is, functions of 
one coordinate only). Matplotlib also supports several ways to plot data that is a function 
of two dimensions. 


7.2.1 Contour plots 

The pyplot method contour makes a contour plot of a provided two-dimensional 
array. In its simplest invocation, contour (z), no further arguments are required: 
the (x, y) values are indexes into the two-dimensional array z and contour intervals 
are selected automatically. To explicitly include (x,y) coordinates, pass them as 
contour (x, Y, z). The arrays x and Y must have the same shape as z (for example, 
as produced by np.meshgrid: see Section 6.1.6) or be one-dimensional such that X 
has the same length as the number of columns in z, and Y has the same length as the 
number of rows in z. 

The contour levels can be controlled by a further argument: either a scalar, N, giving 
the total number of contour levels, or a sequence, v, explicitly listing the values of z at 
which to draw contours. 

The contours are colored according to Matplotlib’s default colormap. In this process, 
the data are normalized linearly onto the interval [ o, l ], which is then mapped onto a 
list of colors that are used to style the contours at the corresponding values. The module 
matplotlib. cm provides several colormap schemes: 14 some of the more practical 
ones are cm.hot, cm.bone, cm.winter, cm. jet, cm.Greys and cm.hsv. If you 
want to use a colormap with its colors reversed, tack a _r on the end of its name (e.g„ 
cm. hot_r). 


14 See the page http://wiki.scipy.org/Cookbook/Matplotlib/Show_colormaps for a complete list. 
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As an alternative, contour supports the colors argument which takes either a 
single Matplotlib color specifier or a sequence of such specifiers. For single-color con¬ 
tour plots, contours corresponding to negative values are plotted in dashed lines. The 
widths of the contour lines can be styled individually or ali together with the argument 
linewidths. 


Example E7.18 The following code produces a plot of the electrostatic potential of an 
electric dipole p = (qd, 0,0) in the ( x,y ) plane for q = 1.602 x 10” 19 C,d = 1 pm 
using the point dipole approximation (see Figure 7.20). 

Listing 7.18 The electrostatic potential of a point dipole 

# eg7-elec-dipole-pot .py 
import numpy as np 

import matplotlib.pyplot as plt 

# Dipole charge (C), Permittivity of free space (F.m-1) 
q, epsO = 1.602e-19, 8.854e-12 

# Dipole +q, -q distance (m) and a convenient combination of parameters 
d = l.e-12 

k = l/4/np.pi/epsO * q * d 

# Cartesian axis system with origin at the dipole (m) 

X = np.linspace(-5e-ll, 5e-ll, 1000) 

Y = X.copy() 

X, Y = np.meshgrid(X, Y) 

# Dipole electrostatic potential (V ), using point dipole approximation 
Phi = k * X / np.hypot(X, Y)**3 

fig = plt.figure() 

ax = fig.add_subplot(111) 

# Draw contours at values of Phi given by levels 
levels = np.array([10**pw for pw in np.linspace(0,5,20)] ) 
levels = list(-levels) + list(levels) 

# Monochrome plot of potential 

ax.contour(X, Y, Phi, levels=levels, colors='k', linewidths=2) 
plt.show() 


To add labeis to the contours, store the ContourSet object returned by the call 
to ax.contour and pass it to ax.clabel (perhaps with some additional parameters 
dictating the font properties). A further method, ax. contourf, which takes the same 
arguments as contour, draws filled contours. contour and ax. contourf can be used 
together, as in the following example. 


Example E7.19 This program produces a filled contour plot of a function, labeis the 
contours and provides some custom styling for their colors (see Figure 7.21). 
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x 10 -11 



Figure 7.20 A contour plot of the electrostatic potential of a point dipole. 

Listing 7.19 An example of filled and styled contours 

# eg7-2dgau.py 
import numpy as np 

import matplotlib.pyplot as plt 
import matplotlib.cm as cm 

X = np.linspace(0,1,100) 

Y = X.copy() 

X, Y = np.meshgrid(X, Y) 
alpha = np.radians(25) 
cX, cY = 0.5, 0.5 
sigX, sigY = 0.2, 0.3 

rX = np.cos(alpha) * (X-cX) - np.sin(alpha) * (Y-cY) + cX 
rY = np.sin(alpha) * (X-cX) + np.cos(alpha) * (Y-cY) + cY 

Z = (rX-cX)*np.exp(-((rX-cX)/sigX)**2) * np.exp(- ((rY-cY)/sigY)**2) 

fig = plt.figure() 

ax = fig.add_subplot(111) 

# Reversed Greys colormap for filled contours 
cpf = ax.contourf(X,Y,Z, 20, cmap=cm.Greys_r) 

# Set the colors of the contours and labeis so they're white where the 

# contour fili is dark (Z < 0) and black where it's light (Z >= 0) 
colors = ['w' if level<0 else ' k' for level in cpf.levels] 

cp = ax.contour(X, Y, Z, 20, colors=colors) 
ax.clabel(cp, fontsize=12, colors=colors) 
plt.show() 


7.2.2 Heatmaps 

Another way to depict two-dimensional data is as a heatmap: an image in which the 
color of each pixel is determined by the corresponding value in the array of data. The use 
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Figure 7.21 A two-dimensional plot with labeled contours. 

of Matplotlib’s functions, ax. imshow, ax.pcolor and ax.pcolormesh is described 
in this section. 

ax.imshow 

The Axes method ax. imshow displays an image on the axes. In its basic usage, it takes 
a two-dimensional array and maps its values to the pixels on an image according to 
some interpolation scheme and normalization. If the array data are taken from an image 
read in with the Matplotlib method image . imread, this is usually all that is required: 

In [x]: import matplotlib.pyplot as plt 
In [x]: import matplotlib.image as mpimg 
In [x]: im = mpimg.imread('image.jpg') 

In [x]: plt.imshow(im) 

In [x]: plt.show() 

(In this case, im is a three-dimensional array of shape (n, m, 3) in which the “depth” 
coordinate corresponds to the red, green and blue components of each pixel in the 
n-by-m image.) 

imshow is frequently used to visualize matrices or other two-dimensional arrays of 
data. The default interpolation produces somewhat blurry-looking images for small 
arrays; for example, to visualize a 10 x 10 matrix as a 100 x 100 pixel image, a lot 
of intermediate points need to be approximated. To display the image with no interpola¬ 
tion, set interpolation=' none' or interpolation=' nearest', as shown in the 
following example. Note that imshow takes a cmap argument that assigns a colormap 
in the same way as it does for ax. contourf. 


Example E7.20 The following code compares two interpolation schemes: ' bilinear', 
which is the default on many new installations of Matplotlib and for a small array is 
blurry and ' nearest', which should look “blocky” (i.e., more faithful to the data): see 
Figure 7.22. 
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Figure 7.22 A small matrix visualized using ax. imshow with two different interpolation 
schemes. 

Listing 7.20 A comparison of interpolation schemes for a small array visualized with imshow () 

# eg7-matrix-show. py 
import numpy as np 

import matplotlib.pyplot as plt 
import matplotlib.cm as cm 

# Make an array with ones in the shape of an 'X' 
a = np.eye(10,10) 

a += a [ : :-1, :] 

fig = plt.figure() 

axi = fig.add_subplot(121) 

# Bilinear interpolation - this will look blurry 
axi.imshow(a, cmap=cm.Greys_r) 

ax2 = fig.add_subplot(122) 

# ’nearest' interpolation - faithful but blocky 

ax2.imshow(a, interpolation='nearest', cmap=cm.Greys_r) 

plt.show() 


Example E7.21 The Barnsley Fern is a fractal that resembles the Black Spleenwort 
species of fem. It is constructed by plotting a sequence of points in the (x,y) plane, 
starting at ( 0 , 0 ), generated by the following affine transformations /i, / 2 , / 3 , and /4 
where each transformation is applied to the previous point and chosen at random with 
probabilitiespi = 0.01, P 2 = 0.85, p?, = 0.07 and /24 = 0.07. 


/1 (x, y) 
fi(x,y) 
/3 (x, y) 
Mx,y) 


0 0 \ / x 

0 0.16 )\y 

0.85 0.04 \ / x \ /0 \ 

-0.04 0.85 ) \ y ) + \ 1.6 ) 

0.2 -0.26 \ / x \ /0 \ 

0.23 0.22 ) \ y ) + V 1.6 ) 

-0.15 0.28 \ / x \ /0 \ 

0.26 0.24 ) V y ) + \ 0.44 ) 
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0 50 100 150 200 250 

Figure 7.23 The Barnsley fem fractal. 


This algorithm is implemented in the program below and the resuit is depicted in 
Figure 7.23. 

Listing 7.21 Barnsley’s fern 

# eg7-fern.py 
import numpy as np 
import matplotlib.pyplot as plt 
import matplotlib.cm as cm 


fl = lambda x,y: 
f2 = lambda x,y: 
f3 = lambda x,y: 
f4 = lambda x, y: 
fs = [fl, f2, f3 


(0., 0.16*y) 

(0.85*x + 0.04*y, -0.04*x 
(0.2*x - 0.26*y, 0.23*x + 
(-0.15*x + 0.28*y, 0.26*x 
f 4] 


+ 0.85*y 
0.2 2 *y + 
+ 0.24*y 


+ 1 . 6 ) 
1 . 6 ) 

+ 0.44) 


npts = 50000 

# Canvas size (pixels) 

width, height = 300, 300 

aimg = np.zeros((width, height)) 


x, y = 0, 0 

for i in range(npts): 

# Pick a random transformat ion and apply it 

f = np.random.choice(fs, p=[0.01, 0.85, 0.07, 0.07]) 
x, y = f (x,y) 

# Map (x,y) to pixel coordinates. 

# NB we "know" that -2.2 < x < 2.7 and 0 <= y < 10 

ix, iy = width / 2 + x * width / 10, y * height / 12 

# Set this point of the array to 1 to mark a point in the fern 
aimg[iy, ix] = 1 

plt.imshow(aimg[::-1,:], cmap=cm.Greens) 
plt.show() 
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ax. pcolor and ax. pcolormesh 

There are a couple of other similar Matplotlib methods that you will come across: 
ax.pcolor and ax.pcolormesh. These are very similar. The precise differences are 
beyond the scope of this book, but pcolormesh is very much faster than pcolor and is 
the recommended altemative to imshow for this reason. The most noticeable difference 
is that imshow follows the convention used in the image-processing community that 
places the origin in the top-left corner; the pcolor methods associate the origin with 
the bottom-left comer. 

Color bars 

It is often useful to have a legend indicating how the colors of the plot relate to the 
values of the array used to derive it. This is added with the f ig. colorbar method. 
In its most simple usage, simply call f ig. colorbar ( mappable) where mappable is 
the Image, ContourSet or other suitable object to which the colorbar applies and a 
new Axes object holding the colorbar will be created (and room made in the figure to 
accommodate it). This object can be further customized and labeled, as shown in the 
following examples. 


Example E7.22 The following code reads in a data file of maximum daily temperatures 
in Boston for 2012 and plots them on a heatmap, with a labeled colorbar legend (see 
Figure 7.24). The data file may be downloaded from 

Listing 7.22 Heatmap of Bostorfs temperatures in 2012 

# eg7-heatmap.py 

import numpy as np 
import matplotlib.pyplot as plt 

# Read in the relevant data from our input file 

dt = np.dtype([('month', np.int), ('day', np.int), ('T', np.float)]) 
data = np.genfromtxt('boston2012.dat', dtype=dt, usecols=(1,2,3), 
delimiter=(4,2,2,6)) 

# In our heatmap, nan will mean "no such date", e.g., 31 June 
heatmap = np.empty((12, 31)) 
heatmap[:] = np.nan 

for month, day, T in data: 

# NumPy arrays are zero-indexed; days and months are not! 
heatmap[month-1, day-1] = T 

# Plot the heatmap, customize and label the ticks 
fig = plt.figureO 
ax = fig.add_subplot(111) 

im = ax.imshow(heatmap, interpolation='nearest') 
ax.set_yticks(range(12)) 

ax.set_yticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 

'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']) 
days = np.array(range(0, 31, 2) ) 
ax.set_xticks(days) 


scipy thon. com/eg/aah 
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Maximum daily temperatures in Boston, 2012 



Day of month 



Temperature, °C 


Figure 7.24 A heatmap of maximum daily temperatures in Boston during 2012. 


ax.set_xticklabels(['{:d}format(day+l) for day in days]) 
ax.set_xlabel( 7 Day of month 7 ) 

ax.set_title( 7 Maximum daily temperatures in Boston, 2 012') 

# Add a colorbar along the bottom and label it 
O cbar = fig.colorbar(ax=ax, mappable=im, orientation= 7 horizontal 7 ) 
cbar.set_label( 7 Temperature, $*\circ\mathrm{c}$ 7 ) 

plt.show() 


O The “mappable” object passed to f ig. colorbar is the Axes Image object returned 
by ax.imshow. 


Example E7.23 The two-dimensional diffusion equation is 

du _ / d 2 u d 2 u\ 

dt \ 9x 2 dy 2 ) 

where D is the diffusion coefficient. A simple numerical solution on the domain of the 

(n) 

unit square 0 < x < 1,0 < y < 1 approximates U (x, v; t) by the discrete function u) ■ 

‘‘V 

where x = i Ax, y = jAy and t = nAt. Applying finite difference approximations yields 
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and hence the state of the system at time step n + 1, 11 may be calculated from its 
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state at time step n, u■ ■ through the equation 
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Consider the diffusion equation applied to a metal piate initially at temperature T co id 
apart from a disc of a specified size, which is at temperature Thot- We suppose that the 
edges of the piate are held fixed at T c00 i- The following code applies the above formula 
to follow the evolution of the temperature of the piate. It can be shown that the maximum 
time step, At, that we can allow without the process becoming unstable is 

At = 1 (AxAy) 2 

2 D (Ax) 2 + (Ay) 2 ' 

In the code below, each call to do_t ime step updates the numpy array u from the 
results of the previous time step, uO. The simplest approach to applying the partial 
difference equation is to use a Python loop: 

for i in range(1, nx-1): 

for j in range(1, ny-1): 

uxx = (uO[i+l,j] - 2*u0[i / j] + uO[i-l,j]) / dx2 
uyy = (uO[i,j+l] - 2*u0[i,j] + uO[i,j-l]) / dy2 
u[i,j] = uO[i,j] + dt * D * (uxx + uyy) 

However, this runs extremely slowly and using vectorization will farm out these 
explicit loops to the much faster precompiled C code underlying NumPy’s array 
implementation. 

The state of the system is plotted as an image at four different stages of its evolution 
(see Figure 7.25). 

Listing 7.23 The two-dimensional diffusion equation applied to the temperature of a Steel piate 

# eg7-diffusion2d.py 
import numpy as np 

import matplotlib.pyplot as plt 

# piate size, mm 
w = h = 10. 

# intervals in x-, y- directions, mm 
dx = dy = 0.1 

# Thermal diffusivity of Steel, mm2.s-l 
D = 4 . 

Tcool, Thot = 300, 700 

nx, ny = int(w/dx), int(h/dy) 

dx2, dy2 = dx*dx, dy*dy 

dt = dx2 * dx2 / (2 * D * (dx2 + dy2) ) 

uO = Tcool * np.ones((nx ( ny)) 
u = np.empty((nx, ny)) 

# Initial conditions - ring of inner radius r, width dr centered at (cx,cy) (mm) 
r, cx, cy = 2, 5, 5 

r2 = r**2 

for i in range(nx): 

for j in range(ny): 

p2 = (i*dx-cx)**2 + (j*dy-cy)**2 
if p2 < r2: 

uO [i,j] = Thot 
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Figure 7.25 A representation of the temperature of a circular disc at four times after its 
instantaneous heating. 


def do_timestep(uO, u): 

# Propagate with forward-difference in time, central-difference in space 
U [1: -1, 1: -1] = U0[l:-1, 1:-1] + D * dt * ( 

(uO [2 : , 1: -1] - 2*u0 [1: -1, 1:-1] + u0[:-2, l:-l])/dx2 

+ (uO[1:-1, 2:] - 2*u0[l:-l, 1:-1] + uO[l:-l, :-2])/dy2 ) 

uO = u.copyO 

return uO, u 

# Number of timesteps 
nsteps = 101 

# Output 4 figures at these timesteps 
mfig = [0, 10, 50, 100] 

fignum = 0 

fig = plt.figure() 

for m in range(nsteps): 

uO, u = do_timestep(uO, u) 
if m in mfig: 

fignum += 1 
print(m, fignum) 

ax = fig.add_subplot(220 + fignum) 

im = ax.imshow(u.copy(), cmap=plt.get_cmap('hot'), vmin=Tcool,vmax=Thot) 
ax.set_axis_off() 

ax.set_title ('{:. lf} ms'.format(m*dt*1000)) 
fig.subplots_adjust(right=0.85) 

O cbar_ax = fig.add_axes([0.9, 0.15, 0.03, 0.7]) 
cbar_ax.set_xlabel('$T$ / K', labelpad=20) 
fig.colorbar(im, cax=cbar_ax) 
plt.show() 


O To set a common colorbar for the four plots we define its own Axes, cbarax 
and make room for it with fig. subplots_adjust. The plots ali use the same color 
range, defined by vmin and vmax, so it doesn’t matter which one we pass in the first 
argument to f ig. colorbar. 
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7.2.3 3D plots 

Matplotlib is primarily a 2D plotting library, but it does support 3D plotting function- 
ality that is good enough for many purposes. The easiest way to set up a 3D plot is 
to import Axes3D from the mpl toolkits .mplot3d module and to set the subplot’s 
pro j ection argument to '3d': 

import matplotlib.pyplot as plt 

from mpl_toolkits.mplot3d import Axes3D 


fig = plt.figureO 

ax = fig.add_subplot(111, projection='3d') 

The corresponding Axes object can then depict data in three dimensions as a line 
plot, scatterplot, wireframe plot or surface plot. 15 

ax.plot_wireframe and ax.plot_surface 

The simplest kind of surface plot is a wireframe plot that draws lines in 3D perspec- 
tive joining the provided two-dimensional array of points, z, on a grid of data values 
provided as two-dimensional arrays X and Y (as for imshow and contour). By default, 
wires are drawn for every point in the array: if this is too many, set the arguments 
rstride and cstride to specify the array row step size and column step size. 

The ax.plot surface method is similar but produces a surface plot of filled 
patches. The patch colors can be set to a single color with the color argument or styled 
to a specifed color map with the cmap argument. rstride and cstride default to 
10 for the ax.plot surface method. Both methods are illustrated in the following 
example. 


Example E7.24 Some of the different options for producing surface plots are illustrated 
by the code below, which produces Figure 7.26. 

Listing 7.24 Four 3D plots of a simple two-dimensional Gaussian function 

# eg7-3d-surface-plots.py 

import numpy as np 

import matplotlib.pyplot as plt 

from mpl toolkits.mplot3d import Axes3D 

import matplotlib.cm as cm 


L, n = 2, 400 

x = np.linspace ( -L, L, n) 
y = x.copyO 

X, Y = np.meshgrid(x, y) 

Z = np.exp ( -(X**2 + Y**2) ) 


fig, ax = plt.subplots(nrows=2, ncols=2, subplot kw={'projection': 
ax[0,0].plot wireframe(X, Y, Z, rstride=40, cstride=40) 
ax[0,l].plot surface(X, Y, Z, rstride=40, cstride=40, color= / m' ) 

'3d' }) 


15 It is even possible to produce three-dimensional contour plots and bar charts, though these are of doubtful 


use in practice. 
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Figure 7.26 Four different 3D surface plots of the same function. 

ax[1,0].plot_surface(X, Y, Z, rstride=12 / cstride=12, color='m') 
ax [1,1] .plot_surface(X, Y, Z, rstride=20, cstride=20, cmap=cm.hot) 
for axes in ax.flattenO : 

axes.set_xticks([-2 , - 1 , 0, 1 , 2]) 
axes.set_yticks([-2 , - 1 , 0, 1 , 2]) 
axes.set_zticks([0, 0.5, 1]) 
fig.tight_layout() 
plt.show() 


In an interactive plot, the viewing direction can be changed by clicking and dragging 
on the plot. To fix a particular viewing direction for a static plot image, pass the required 
elevation and azimuthal angles (in degrees, in that order) to ax.view_init, as in the 
following example. 


Example E7.25 The parametric description of a torus with radius c and tube radius a is 

x = (c + a cos 9) cos 0 
y = (c + a cos 9) sin 0 
z = a sin 9 

for 9 and 0 each between 0 and 2tt. The code below outputs two views of a torus 
rendered as a surface plot (Figure 7.27). 

Listing 7.25 A 3D surface plot of a torus 

# eg7-torus-surface .py 

import numpy as np 

import matplotlib.pyplot as plt 

from mpl_toolkits.mplot3d import Axes3D 
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Figure 7.27 Two views of the same torus: (a) 0 = 36°, 0 = 26°, (b) 0 = 0°,0 = 0°. 


n = 100 

theta = np.linspace(0, 2.*np.pi, n) 
phi = np.linspace(0, 2.*np.pi, n) 

O theta, phi = np.meshgrid(theta, phi) 
c, a = 2, 1 

x = (c + a*np.cos(theta)) * np.cos(phi) 
y = (c + a*np.cos(theta)) * np.sin(phi) 
z = a * np.sin(theta) 

fig = plt.figureO 

axi = fig.add_subplot(121, projection='3d') 
axi.set_zlim(-3,3) 

© axi.plot_surface(x, y, z, rstride=5, cstride=5, color='k', edgecolors='w') 
© axi.view_init(36, 26) 

ax2 = fig.add_subplot(122, projection='3d') 
ax2.set_zlim(-3,3) 

ax2.plot_surface(x, y, z, rstride=5, cstride=5, color='k', edgecolors='w') 
ax2.view_init(0, 0) 
ax2.set_xticks([]) 
plt.show() 


O We need 0 and 0 to range over the interval (0,27r) independently, so we use a 
meshgrid. 

© Note that we can use keywords such as edgecolors to style the polygon patches 
createdby ax.plot_surface. 

© Elevation angle above the xy-plane of 36°, azimuthal angle in the xy-plane of 26°. 


Line plots and scatterplots 

Line plots and scatterplots work in 3D in a way similar to how they work in 2D: the 
basic method call is ax.plot(x, y, z) and ax. scatter (x, y, z), where x, y 
and z are equal-length, one-dimensional arrays. Only limited annotation of such plots 
is possible without using advanced methods, however. 
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Figure 7.28 A depiction of circularly polarized light as a helix on a three-dimensional plot. 


Example E7.26 Below is a simple example of a three-dimensional plot of a helix, 
which could represent circularly polarized light, for example. See Figure 7.28. 

Listing 7.26 A depiction of a helix on a three-dimensional plot 

# eg7-circular-polarization.py 
import numpy as np 

import matplotlib.pyplot as plt 

from mpl_toolkits.mplot3d import Axes3D 

n = 1000 

fig = plt.figure() 

ax = fig.add_subplot(111, projection='3d') 

# Plot a helix along the x-axis 
theta_max = 8 * np.pi 

theta = np.linspace(0, theta_max, n) 
x = theta 
z = np.sin(theta) 
y = np.cos(theta) 
ax.plot(x, y, z, 'b', lw=2) 

# An line through the center of the helix 

ax.plot((-theta_max*0.2, theta_max * 1-2), (0,0), (0,0), color='k', lw=2) 

# sin/cos components of the helix (e.g., electric and magnetic field 

# components of a circularly polarized electromagnetic wave 
ax.plot(x, y, 0, color='r', lw=l, alpha=0.5) 

ax.plot(x, [0]*n, z, color='m', lw=l, alpha=0.5) 

# Remove axis planes, ticks and labeis 
ax.set_axis_off() 

plt.show() 


7.2.4 Exercises 
Questions 

Q7.2.1 Generate an image plot of the sine function in the Cartesian plane, sinc(r) = 
sin r/r where r = ^Jx 1 + y 2 . 
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Q7.2.2 The data provided in the comma-separated file birthday-data. csv, avail- 
able at scipython.com/ex/agd gives the number of births recorded by the US Centers 


for Disease Control and Prevention’s National Center for Health Statistics for each day 
oftheyear as atotalfromyears 1969-1988. Thecolumns aremonthnumber (l=January, 
12=December), day number and number of live births. 

Use NumPy to estimate, for each day of the year, the probability of a particular 
individual’s birthday being on that day. Plot the probabilities as a heatmap like that 
of Example E7.22 and investigate any features of interest. 

Hint: the data need “cleaning” to a small extent - inspect the data file first to establish 
the presence of any incorrect entries. 


Problems 

P7.2.1 The so-called ‘chaos game ’ is an algorithm for generating a fractal. First define 
the n vertices of a regular polygon and an initial point, (xo, >’o) selected at random within 
the polygon. Then generate a sequence of points, starting with (xo, >'o), where each point 
is a fraction r of the distance between the previous one and a polygon vertex chosen at 
random. For example, the algorithm applied with parameters n = 3, r = 0.5 generates 
a Sierpinski triangle. 

Write a program to draw fractals using the chaos game algorithm. 


P7.2.2 Extend the code in Example E7.16 to include contours of body mass index, 
defined by BMI = (mass/kg)/(height/m) 2 . Plot these contours to delimit the supposed 
categories of “under-weight” (< 18.5), “over-weight” (> 25) and “obese” (> 30). Man- 
ually place the contour labeis so that they are out of the way of the scatterplotted data 
points and format them to one decimal place. 


P7.2.3 The two-dimensional advection equation may be written 

du _ du du 

3 1 x dx y 3 y ’ 

where v = (v x , v y ) is the vector velocity field (giving the velocity components v x and 
v y , which may vary as a function of position, (x, y)). In a similar way to the approach 
taken in Example E7.23, this equation may be discretized and solved numerically. With 
forward-differences in time and central-differences in space, we have 
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Implement this approximate numerical solution on the domain 0 < x < 10,0 < y < 10 
discretized with Ax = Av = 0.1 with the initial condition 


uo(x,y ) = exp 


(x - c x y + (y - c y Y 


a- 


where (c x , c y ) = (5,5) and a = 2. Take the velocity field to be a circulation at constant 
speed 0.1 about an origin at (7,5). 
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P7.2.4 The Julia set associated with the complex function f(z) = z 2 + c may be 
depicted using the following algorithm. 

For each point, zo, in the complex plane such that —1.5 < Re|zol <1.5 and —1.5 < 
Im[zol < 1.5, iterate according to z. n +\ = z 2 +c. Color the pixel in an image correspond- 
ing to this region of the complex plane according to the number of iterations required 
for |z| to exceed some critical value, |z| max (or black if this does not happen before a 
certain maxmimum number of iterations n max ). 

Write a program to plot the Julia set for c = —0.1 + 0.65/, using |z| raax = 10 and 
K ma X — 500. 


P7.2.5 The mean altitudes of the 10 km x 10 km hectad squares used by the UK’s Ord- 
nance Survey in mapping Great Britain are given in the NumPy array file gb-alt. npy, 
available at scipython.com/ex/agb . NaN values in this array deno te the sea. 


Plot a map of the island using this data with ax. imshow and plot further maps 
assuming a mean sea-level rise of (a) 10 m, (b) 50 m, (c) 200 m. In each case, deduce 
the percentage of land area remaining, relative to its present value. 
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8 SciPy 


SciPy is a library of Python modules for scientific computing that provides more specific 
functionality than the generic data structures and mathematical algorithms of NumPy. 
For example, it contains modules for the evaluation of special functions frequently 
encountered in Science and engineering, optimization, integration, interpolation and 
image manipulation. As with the NumPy library, many of SciPy’s underlying algorithms 
are executed as compiled C code, so they are fast. Also like NumPy and Python itself, 
SciPy is free Software. 

There is little new syntax to learn in using the SciPy routines, so this chapter will 
focus on examples of the library’s use in short programs of relevance to Science and 
engineering. 


8.1 Physical constants and special functions 

The useful scipy. constants package provides the intemationally agreed Standard 
values and uncertainties for physical constants. The scipy. special package also 
supplies a large number of algorithms for calculating functions that appear in Science, 
mathematical analysis and engineering, including: 

• Airy functions 

• Elliptic functions and integrals 

• Bessel functions, their zeros, derivatives and integrals 

• Spherical Bessel functions 

• Struve functions 

• A variety of statistical functions and distributions 

• Gamma and beta functions 

• The error function 

• Fresnel integrals 

• Legendre functions and associated Legendre functions 

• A variety of orthogonal polynomials 

• Hypergeometric functions 

• Parabolic cylinder functions 

• Matheiu functions 

• Spheroidal functions 

• Kelvin functions 

333 
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They are described in detail in the documentation; 1 we focus in this section on a few 
representative examples. 

Most of these special functions are implemented in SciPy as universal functions: that 
is, they support broadcasting and vectorization (automatic array-looping), and so work 
as expected with NumPy arrays. 


8.1.1 Physical constants 

SciPy contains the 2010 CODATA internationally recommended values 2 of many 
physical constants. They are held, with their units and uncertainties, in a dictionary, 
scipy. constants. physical constants, keyed by an identifying string. For 
example, 

In [x]: import scipy.constants as pc 

In [x]: pc.physical_constants['Avogadro constant'] 

Out[x]: (6.02214129e+23, 'mol A -l', 2.7e+16) 

The convenience methods value, unit and precision retrieve the corresponding 
properties on their own: 

In [x]: pc.value( 7 elementary charge 7 ) 

Out[x]: 1.602176565e-19 

In [x]: pc.unit( 7 elementary charge 7 ) 

Out[x]: 'C' 

In [x]: pc.precision( 7 elementary charge 7 ) 

2.1845282701410628e-08 

To save typing, it is usual to assign the value to a variable name at the start of a program, 
for example, 

In [x]: muB = pc.value( 7 Bohr magneton 7 ) 

A full list of the constants and their names is given in the SciPy documentation , 3 * * 
but Table 8.1 lists the more important ones. Some particularly important constants have 
a direct variable assignment with in scipy. constants (in SI units) and so can be 
imported directly: 

In [x]: from scipy.constants import c, R, k 

In [x]: c, R, k # speed of light, gas constant, Boltzmann constant 

Out[x]: (299792458.0, 8.3144621, 1.3806488e-23) 

Where this is the case, the variable name is given in the table. You will probably lind it 
convenient to use the scipy. constants values, but should be aware that if and when 
newer values are released the package may be updated - this means that your code may 
produce slightly different results for different versions of SciPy. 

There are one or two useful conversion factors and methods, and SI prefixes delined 
within the scipy. constants package, for example, 


1 http://docs.scipy.org/doc/scipy/reference/special.html. 

2 P. J. Mohr, B. N. Taylor, D. B. Newell, (2012). Rev. Mod. Phys ., 84, 1527. 

2 http://docs.scipy.org/doc/scipy/reference/constants.html. 
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Table 8.1 Physical constants in scipy. constants 


Constant string 

Variable 

Value 

Units 

'atomic mass constant' 


1.660538921e-27 

kg 

'Avogadro constant' 

N_A 

6.02214129e + 23 

mol -1 

'Bohr magneton' 


9.27400968e-24 

JT -1 

'Bohr radius' 


5.2917721092e-ll 

m 

'Boltzmann constant' 

k 

1.3806488e-23 

JK- 1 

'electron mass' 

m e 

9.10938291e-31 

kg 

'elementary charge' 

e 

1.602176565e-19 

C 

'Faraday constant' 


96485.3365 

C mol -1 

'fine-structure constant' 

alpha 

0.0072973525698 


'molar gas constant' 

R 

8.3144621 

J K -1 mol -1 

'neutron mass' 

m n 

1.674927351e-27 

kg 

'Newtonian constant of 

G 

6.67384e-ll 

m J kg -i s -i 

gravitation' 

'Planck constant' 

h 

6.62606957e-34 

J s 

'Planck constant over 2 pi' 

hbar 

1.054571726e-34 

J s 

'proton mass' 

m_p 

1.672621777e-27 

kg 

'Rydberg constant' 

Rydberg 

10973731.5685 

m -1 

'speed of light in vacuum' 

C 

299792458.0 

ms -1 


In 

[x] 


import scipy.constants 

as 

pc 

In 

[X] 


pc.atm 




Out 

[x] 


101325.0 

# 

1 

atm in Pa 

In 

[x] 


pc.bar 




Out 

[x] 


100000.0 

# 

1 

bar in Pa 

In 

[x] 


pc.torr 




Out 

[x] 


133.32236842105263 

# 

1 

torr in Pa 

In 

[x] 


pc.zero Celsius 




Out 

[x] 


273.15 

# 

0 

degC in K 

In 

[x] 


pc.micro 

# 

also nano, pico 

Out 

[x] 


le-06 





giga. 


et c. 


Example E8.1 Let’s use the scipy. constants .physical_constants dictionary 
to determine which are the least accurately known constants. To do this we need the 
relative uncertainties in the constants’ values. The code mentioned here uses a structured 
array to calculate these and outputs the least well-determined constants. 

Listing 8.1 Least well-defined physical constants 

import numpy as np 

from scipy.constants import physical_constants 
def make_record(k, v): 

Return the record for this constant from the key and value of its entry 
in the physical_constants dictionary. 
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name = k 

val, units, abs_unc = v 

# Calculate the relative uncertainty in ppm 
rel_unc = abs_unc / abs(val) * l.e6 
return name, val, units, abs_unc, rel_unc 

dtype = [('name', 'S50'), ('val', 'f8'), ('units', 'S20'), 

('abs_unc', 'f8'), ('rel_unc', 'f 8')] 

constants = np.array([make_record(k, v) for k,v in physical_constants.items()], 
dtype=dtype ) 

constants.sort(order='rel_unc') 

# List the 10 constants with the largest relative uncertainties 
for rec in constants[-10:]: 

print('{:.Of} ppm: {:s} = {:g} {:s}'.format(rec['rel_unc'], 


rec['name'] .decode(), rec['val'], rec ['units'] .decode())) 


The output is shown here. Note that G is not known to better than about 120 ppm 
(parts per million.) 

91 ppm: proton-tau mass ratio = 0.528063 

91 ppm: tau mass energy equivalent = 2.84678e-10 J 

92 ppm: tau mass = 3.16747e-27 kg 

119 ppm: Newtonian constant of gravitation over h-bar c = 6.70837e-39 (GeV/c A 2) A -2 

120 ppm: Newtonian constant of gravitation = 6.67384e-ll m A 3 kg A -l s A -2 
545 ppm: proton mag. shielding correction = 2.5694e-05 

545 ppm: proton magn. shielding correction = 2.5694e-05 
980 ppm: deuteron rms charge radius = 2.1424e-15 m 
5812 ppm: proton rms charge radius = 8.775e-16 m 
9447 ppm: weak mixing angle = 0.2223 


8.1.2 Airy and Bessel functions 


The Airy functions Ai(x) and Bi(x) are the linearly independent Solutions to the Airy 
equation, y" — xy = 0 , which occurs in quantum mechanics, optics, electrodynamics 
and other areas of physics. The functions (Ai, Bi) and their derivatives (Aip, Bip) 
are retumed by the function scipy. special. airy. The only required argument is x, 
which could be complex and can be a NumPy array: 

In [x]: Ai, Aip, Bi, Bip = airy(O) 

In [x] : Ai, Aip, Bi, Bip 

(0.35502805388781722, -0.25881940379280682, 0.61492662744600068, 0.44828835735382638) 

The first nt zeros of the Airy functions and their derivatives are returned by the 
function scipy. special. ai_zeros (nt): 

In [x]: a, ap, ai, aip = ai_zeros(2) # arrays for the first 2 zeros of Ai 

In [x] : a[l], ap[l], ai [1] , aip[l] # look at the 2nd zero: 

Out[x]: (-4.0879494441309721, -3.248197582179837, -0.41901547803256406, 


-0.80311136965486463) 


In [x] : airy (a [1] ) [0] 

Out[x]: 1,2774882441379295e-15 
In [x] : airy (ap [1] ) [1] 


# Ai (a) should = 0 

# close enough 

# Aip(ap) should = 0 
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Out [x] 

-3.2322209157744908e-16 

# 

close 

enough 


In [x] 

Out [x] 

airy(ap [1]) [0] 

-0.41901547803256395 

# 

Ai (ap) 

is returned as 

ai above 

In [x] 

Out [x] 

airy(a[1]) [1] 

-0.80311136965486396 

# 

Aip(a) 

is returned as 

aip above 


0 


Example E8.2 Consider a particle of mass m moving in a constant gravitational field 
such that its potential energy at a height z above a surface is mgz. If the particle bounces 
elastically on the surface, the classical probability density corresponding to its posi- 
tion is 


Pc\(z) = 


max femax 


z)' 


where z max is the maximum height it reaches. 

The quantum mechanical behaviour of this system may described by the solution to 
the time-independent Schrodinger equation, 


/r d 2 x[r 
2 m d z 2 


+ mgz^f = E\j/ 


which is simplified by the coordinate rescaling q = z/a where a — (h 2 /2m 2 gy^\ 


—^--{q-q E )ir = 0, where q E = -. 

d q 1 - mga 

The Solutions to this differential equation are the Airy functions. The boundary condi- 
tion x// (z) -> 0 as z —> oo specifically gives: 


f{q) = N E Ai(q - q E ), 


where N E is a normalization constant. 

The second boundary condition, \jr{q = 0) = 0, leads to quantization in terms of a 
quantum number n = 1,2,3, • ■ • with scaled energy values q E found from the zeros of 
the Airy function: Ai(— q E ) = 0. 

The following program plots the classical and quantum probability distributions, 
P c \iz) and |i/r(z)| 2 , for n = 1 and n = 16 (Figure 8.1). 


Listing 8.2 Probability densities for a particle in a uniform gravitational field 


# egd-qm-gravfield.py 
import numpy as np 

from scipy.special import airy, ai_zeros 
import pylab 


nmax = 16 


# Find the first nmax zeros of Ai(x) 

O a, ai_zeros(nmax) 

# The actual boundary condition is Ai(-qE) = 0 at q=0, so: 
qE = -a 

def prob_qm(n) : 
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Figure 8.1 A comparison of classical and quantum probability distributions for a particle moving 
in a constant gravitationa field at two different energies. 


Return the quantum mechanical probability density for a particle moving 
in a uniform gravitational field. 


# The quantum mechanical wavefunet ion is proportional to Ai(q-qE) where 

# the qE corresponding to quantum number n is indexed at n-1 

© psi, _/_/_= airy(q-qE[n-1]) 

# Return the probability density , after rough-and-ready normalization 
P = psi**2 

© return P / (sum(P) * dq) 

def prob_cl(n): 

Return the classical probability density for a particle bouncing 
elastically in a uniform gravitational field. 


# The classical probability density is already normalized 
return 0.5/np.sqrt(qE [n-1]*(qE [n-1]-q)) 

# The ground state, n=l 

q, dq = np.linspace(0, 4, 1000, retstep=True) 
pylab.plot(q, prob_cl(l), label='Classical') 
pylab.plot(q, prob_qm(l), label='Quantum') 
pylab.ylim(0,0.8) 
pylab.legend() 
pylab.show() 
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# An excited state, n=16 

q, dq = np.linspace(0, 20, 1000, retstep=True) 

pylab.plot(q, prob_cl(16), label='Classical') 

pylab.plot(q, prob_qm(16), label='Quantum') 

pylab.ylim(0,0.2 5) 

pylab.legend(loc='upper left') 

pylab.show() 


O We use scipy. special. ai_zeros to retrieve the n = 1 and n = 16 eigenvalues. 
0 scipy. special. airy finds the corresponding wavefunctions and hence probabil- 
ity densities. 

© For the sake of illustration, these are normalized approximately by a very simple 
numerical integration. 


Bessel functions are another important class of function with many applications to 

physics and engineering. SciPy provides several functions for evaluating them, their 

derivatives and their zeros. 

• jn(v, x)andjv(v, x) return the Bessel function of the first kind atx lororder 

v, J v (x). v can be real or integer. 

• yn (n, x) and yv (v, x) retum the Bessel function of the second kind at x for 

integer order n (Y n (x)) and real order v (Y v (x)), respectively. 

• in(n, x) andiv(v, x) return the modified Bessel function of the first kind at 

x for integer order n (/„(x)) and real order v (/,, (x)), respectively. 

• kn(n, x)andkv(v, x) return the modified Bessel function of the second kind 

at x for integer order n (K n (x)) and real order v (K v (x)), respectively. 

• The functions j vp (v, x),yvp(v, x),ivp(v, x) andkvp(v, x) return the 
derivatives of the earlier mentioned functions. By default, the first derivative is 
returned; to return the nth derivative, set the optional argument, n. 

• Several functions can be used to obtain the zeros of the Bessel functions. Probably 
the most useful are jn_zeros(n, nt), jn_zeros(n, nt), jnp_zeros(n, 
nt), yn_zeros (n, nt) and ynp_zeros (n, nt) , which return the first nt 
zeros of J n (x), f n (x), Y n (x) and Y' n (x). 


Example E8.3 The vibrations of a thin circular membrane stretched across a rigid 
circular frame (such as a drum head) can be described as normal modes written in terms 
of Bessel functions: 


z(r,9\ t) = AJ n (kr) sinn@ cos kvt, 

where (r, 6) describes a position in polar coordinates with the origin at the center of the 
membrane, t is time and v is a constant depending on the tension and surface density of 
the drum. The modes are labeled by integers n = 0,1, ■ • • and m = 1,2,3, • ■ ■ where k 
is the mth zero of J n . 
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Figure 8.2 The n = 3, m — 2 normal mode of a vibrating circular dram. 

The following program produces a plot of the displacement of the membrane in the 
n = 3, m = 2 normal mode at time t = 0 (Figure 8.2). 

Listing 8.3 Normal modes of a vibrating circular drum 

# eg8-drum-normal-modes.py 
import numpy as np 

from scipy.special import jn, jn_zeros 
import pylab 

# Allow calculations up to m = mmax 
mmax = 5 

def displacement(n, m, r, theta): 

Calculate the displacement of the drum membrane at (r, theta; t=0) 
in the normal mode described by integers n >= 0, 0 < m <= mmax. 


# Pick off the mth zero of Bessel funet ion Jn 

k = jn_zeros(n, mmax+1)[m] 

return np.sin(n*theta) * jn(n, r*k) 

# Positions on the drum surface are specified in polar coordinates 
r = np.linspace(0, 1, 100) 

theta = np.linspace(0, 2 * np.pi, 100) 

# Create arrays of cartesian coordinates (x, y) ... 
x = np.array([rr*np.cos(theta) for rr in r]) 

y = np.array([rr*np.sin(theta) for rr in r]) 

# ... and vertical displacement (z) for the required normal mode at 

# time, t = 0 
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n, m = 3, 2 

z = np.array([displacement(n, m, rr, theta) for rr in r]) 


pylab.contour(x, y, z) 
pylab.show() 


Example E8.4 In an important paper in 1953. 4 Rosalind Franklin published the X-ray 
diffraction pattern of DNA from calf thymus, which displays a characteristic X shape 
of diffraction spots indicative of a helical structure. 

The diffraction pattem of a uniform, continuous helix consists of a series of “layer 
lines” of spacing l/p in reciprocal space where p is the helix pitch (the height of 
one complete turn of the helix, measured parallel to its axis). The intensity distribu- 
tion along the mh layer line is proportional to the square of the nth Bessel function, 
J n (2jTrR), where r is the radius of the helix and R is the radial coordinate in reciprocal 
space. 

Consider the diffraction pattem of a helix with p = 34 A and r = 10 A. 
The code listing here produces an SVG image of the diffraction pattern of a helix 
(Figure 8.3). 


4 R. E. Franklin, R. G. Gosling, (1953). Nature 171, 740. 
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Listing 8.4 Generating an image of the diffraction pattern of a uniform, continuous helix 

# eg8-dna-diffraction.py 

import numpy as np 

from scipy.special import jn 

import pylab 

# Vertical range of the diffraction pattern: piot nlayer line layers above and 

# below the center horizontal 
nlayers = 5 

ymin, ymax = -nlayers, nlayers 

# Horizontal range of the diffraction pattern, x = 2pi.r.R 
xmin, xmax = -10, 10 

npts = 4000 

x = np.linspace(xmin, xmax, npts) 

# Diffraction pattern along each line layer: lJn(x)1*2 

# for n = 0, 1, . . ., nlayers-1 

O layers = np.array([jn(i, x)**2 for i in range(nlayers)]) 

# Obtain the indexes of the maxima in each layer 

0 maxi = [(np.diff(np.sign(np.diff(layers[i,:]))) < 0).nonzero()[0] + 1 

for i in range(nlayers)] 

# Create the SVG image, using circles of different radii for diffraction spots 
svg_name='eg8-dna-diffraction.svg' 

canvas_width = canvas_height = 500 
fo = open(svg_name, 'w') 

print ["""<?xml version="l.0" encoding="utf-8"?> 

<svg xmlns="www.w3.org/2000/svg" 

xmlns:xlink="www.w3.org/1999/xlink" 

width="{}" height="{}" style="background: {} format( 
canvas_width, canvas_height, '#ffffff'), file=fo) 

def svg_circle(r, cx, cy): 

""" Return the SVG mark up for a circle of radius r centered at (cx,cy). " 
return r'<circle r="{}" cx="{}" cy="{}"/>'•format(r, cx, cy) 

# For each spot in each layer, draw a circle on the canvas. The circle radius 

# is the scaled value of the diffraction intensity maximum, with a ceiling 

# value of spot_max_radius because the center spots are very intense 
spot_scaling, spot_max_radius = 50, 20 

for i in range(nlayers): 
for j in maxi[i]: 

© sx = (x[j] - xmin)/(xmax-xmin) * canvas_width 

sy = (i - ymin)/(ymax - ymin) * canvas_height 
spot_radius = min(layers[i,j]*spot_scaling, spot_max_radius) 
print(svg_circle(spot_radius, sx, sy), file=fo) 
if i: 

# The pattern is symmetric about the center horizontal: 

# duplicate the layers with i > 0 
sy = canvas_height - sy 

print(svg_circle(spot_radius, sx, sy), file=fo) 
print(r'</svg>', file=fo) 
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O The two-dimensional array, layers, holds the diffraction intensity in each line layer, 
calculated as the square of a Bessel function. 

© For plotting the pattem, we need to find the indexes of the maxima in the layers 
array: this line of code finds these maxima by determining where the differences 
between neighboring items go from positive to negative. 

© Map the (x, y) coordinates in the reciprocal space of the diffraction pattern onto the 
canvas coordinates, (sx,sy). 


8.1.3 The gamma and beta functions; elliptic integrals 


The gamma function is defined by the improper integral 



for real x > 0, and extended to negative jc and complex numbers by analytic continua- 
tion. It occurs frequently in integration problems, combinatorics and in expressions for 
other special functions. 

The gamma function and its natural logarithm are returned by the functions gamma (x) 
and gammaln (x) . There are also methods for the evaluation of the incomplete gamma 
functions (obtained by replacing the lower or upper limits in the integral above with the 
parameter a) and their inverses; these will not be described in detail here. 


Example E8.5 The gamma function is related to the factorial by T(x) = (x — 1)! and 
both are plotted in the code mentioned later (see Figure 8.4). Note that T(x) is not 
defined for negative integer x, which leads to discontinuities in the plot. 

Listing 8.5 The Gamma function on the real line 

# eg3-gamma. py 
import numpy as np 

from scipy.special import gamma 
import pylab 

# The Gamma function 

ax = pylab.linspace(-5, 5, 1000) 

pylab.plot(ax, gamma(ax), ls=', c='k', label='$\Gamma(x)$') 

# (x-1)! for x = 1, 2, ..., 6 

ax2 = pylab.linspace(1,6,6) 

xmlfac = np.array([l, 1, 2, 6, 24, 120]) 

pylab.plot(ax2, xmlfac, marker='*', markersize=12, markeredgecolor='r', 


ls='',c='r', label=' $(x-1 )!$') 


pylab.ylim(-5 0,5 0) 
pylab.xlim(-5, 5) 
pylab.xlabel('$x$') 
pylab.legend() 
pylab.show() 
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X 


Figure 8.4 The gamma function on the real line, T(x), and (x — 1)! for integer x > 0. 


The beta function is defined by the definite integral 

B(a,b)= f dt, a > 0,b > 0. 

J o 

It is closely related to the gamma function: B(a,b ) = r {a)V(b)/ V(a + b). The 
scipy. special functions beta (a, b) and betaln (a, b) return the beta function 
and its natural logarithm respectively. As with the gamma function, there is an 
incomplete beta function, B{a,b\x), obtained by replacing the upper limit with x; 
the methods betainc (a, b, x) and betaincinv(a, b, y) return this function 
and its inverse. 


Example E8.6 The exact classical mechanical description of a pendulum is quite com¬ 
plex, and the equations of motion usually only solved in introductory texts for small 
displacements about equilibrium. In this case, the period, T & In^/L/g, and the motion 
is harmonic. 

The general solution requires elliptic integrals, but the special case of a pendulum 
making 180° swings (i.e., ±90° about its equilibrium position) leads to the following 
expression for the period: 

r n / 2 d<9 
o V cos 6 



The substitution x = sin 2 0 transforms this integral into a beta function: 



d0 

Veos 0 


2 /o x l/2 ( ! “ V4 ±x = k B (i’ ?) ■ 
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Therefore, 



To find the period of the pendulum in units of ^/l/g: 

In [x]: import numpy as np 

In [x]: from scipy.special import beta 

In [x]: np.sqrt(2) * beta(0.5, 0.25) 

7.4162987092054875 

(Compare with the harmonic approximation, 2tt = 6.283185.) 


The group of elliptic integrals and related functions form an important class of math- 
ematical objects and have been widely studied. They find application in geometry, 
cryptography, analysis and many areas of physics. The complete elliptic integrals of 
the first and second kind, K(m) and E(m), are defined for 0 < m < lby 



Their values for the parameter m are retumed by the functions ellipk(m) and 
ellipe (m) . The incomplete elliptic integrals (defined by replacing the upper limit of 
7r/2 with the variable 0) are retumed by ellipkinc (phi, m) and ellipeinc (phi. 


m) respectively: 5 



Example E8.7 The problem of finding an arc length of an ellipse is the origin of the 
name of the elliptic integrals. The equation of an ellipse with semi-major axis, a, and 
semi-minor axis, b, may be written in parametric form as 


x = a sin 0 
y = b cos 0 


5 It is necessary to be very careful with the notation of elliptic integrals; many sources use F(q i, m) instead of 
K(<p, m) for the first kind, detine them with interchanged arguments (i.e., F{m,(f>)) or use the parameter fc 2 
instead m 



Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:54, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://d 0 i. 0 rg/l 0.1017/CB09781139871754.008 















346 


SciPy 


The element of length along the ellipse’s perimeter, 

di- =yj dr 2 + d v 2 = y a 2 cos 2 0 + b 2 sin 2 0 d0 
= a-J 1 — e 2 sin 2 0 d0, 

where e = 01 — b 2 /a 2 is the eccentricity. The arc length may therefore be written in 
terms of incomplete elliptic integrals of the second kind: 

/ r<Pi /-—— 

di = a J y 1 — e 2 sin 2 0 d0 = a[E(e ; 02) — E{e\ 0i)]. 

Earth’s orbit is an ellipse with semi-major axis 149,598,261 km and eccentricity 
0.01671123. We will find the distance traveled by the Earth in one orbit, and compare it 
with that obtained assuming a circular orbit of radius 1 AU = 149597870.7 km. 

The perimeter of an ellipse may be written using the earlier expression with <p\ = 

0,02 = 2tt\ 


P = a[E(e, 2n) — E{e, 0)] = 4 aE(e), 

since the entire perimeter is four times the quarter-perimeters, which may be written in 
terms of the complete elliptic integral of the second kind. We have 

In [x]: import numpy as np 

In [x]: from scipy.special import ellipe 

In [x]: a, e = 149598261, 0.01671123 # semi-major axis (km ), eccentricity 

In [x]: pe = 4 * a * ellipe(e) 

In [x]: print (pe) 

936014259.33 # "exact" answer 

In [x]: AU =149597870.7 # mean orbit radius, km 

In [x]: pc = 2 * np.pi * AU 
In [x]: print (pc) 

939951143.1675915 # assuming circular orbit 

In [x]: (pc - pe) /pe * 100 

0.42060084000247772 

That is, the percentage error in the perimeter in treating the orbit as circular is about 

0.42%. 


8.1.4 The error function and related integrals 

The error function, defined by: 

2 C z 2 

erf(z) = —— I e ' di 
s/tt Jo 

for real or complex z does not have a simple closed-form expression and so must be 
calculated numerically. scipy. special has several functions relating to the error 
function: 

• erf (z): the error function; 

• erfc (z): the complementary error function, erfc(z) = 1 — erf(z). It is more 
accurate to use this function for large z than directly subtracting erf(z) from 1; 
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2 

• erf cx ( z ): the scaled complementary error function, e z erfc(z); 

• erf inv (y): the inverse error function; 

• erf cinv (y): the inverse complementary error function; 

• wofz (z) : the Faddeeva function, a scaled complementary error function with 
complex argunrent: 

_ 2 

w(z) = e~ z erfc(— iz) = erfcx(— iz), 

which appears in problems related to plasma physics and radiative transfer; 

• dawsn (z) : the related integral known as Dawson’s integral: 

2 2 

D(z) = e~ : / e 1 d t. 

J o 


Example E8.8 The wavefunction corresponding to the ground state of the one- 
dimensional quantum harmonic oscillator may be written as follows in terms of a 
parameter a = sfmk/h, where m is the mass and k the oscillator force constant. 

'a\i/4 


iAoO) = ( —j exp (^—ax 2 /2^ 


The probability density of the oscillator’s position is given by Pq(x) — i//o(x)| 2 
and is nonzero outside the classical turning points, 1 /2 , a phenomenon known as 
tunneling. We will calculate the probability of tunneling for an oscillator in the state t/^o- 
The wavefunction is symmetric about x = 0, so the probability of tunneling is 


P(x < —a) + P(x > a) = 2 P(x > a) = 2 A / — 


/ exp (— cor) 

Jct ->/2 V 7 


dv 


2 C°° 2 

= —= / e ' dy = erfc(l). 

V 77 J\ 


The complementary error function can be calculated directly: 


In [x]: from scipy.special import erfc 
In [x]: erfc(1) 

0.15729920705028516 


or about 16%. 


Example E8.9 The Voigt line profile occurs in the modeling and analysis of radiative 
transfer in the atmosphere. It is the convolution of a Gaussian profile, G(x; a), and a 
Lorentzian profile, L(x\ y): 


V(x;cr, y) 



G(x'\a)L(x 


x ; y) Ax' where 


G(x; c) 


1 

—^ exp 
<7V27T 



and L(x\ y) 


Y/* 
x 2 + y 2 
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Figure 8.5 A comparison of the Lorentzian, Gaussian and Voigt line shapes with y =« = 0.1. 

Here y is the half-width at half-maximum (HWHM) of the Lorentzian profile and o 
is the Standard deviation of the Gaussian profile, related to its HWHM, o;, by a = 
oy/l In2. In terms of frequency, y, jc = v — vq where vo is the line center. 

There is no closed form for the Voigt profile, but it is related to the real part of the 
Faddeeva function, w(z) by 



The program mentioned here plots the Voigt profile for y = 0.1,a = 0.1 and 
compares it with the corresponding Gaussian and Lorentzian profiles (Figure 8.5). The 
equations mentioned earlier are implemented in the three functions, G, L and v, defined 
in the code here. 

Listing 8.6 A comparison of the Lorentzian, Gaussian and Voigt line shapes 

# eg8 - voigt . py 

import numpy as np 

from scipy.special import wofz 

import pylab 

def G (x, alpha): 

""" Return Gaussian line shape at x with HWHM alpha """ 
return np.sqrt(np.log(2) / np.pi) / alpha\ 

* np.exp(-(x / alpha)**2 * * np.log(2)) 

def L(x, gamma): 

""" Return Lorentzian line shape at x with HWHM gamma """ 
return gamma / np.pi / (x**2 + gamma**2) 

def V(x, alpha, gamma): 

Return the Voigt line shape at x with Lorentzian component HWHM gamma 
and Gaussian component HWHM alpha. 
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sigma = alpha / np.sqrt(2 * np.log(2)) 


return np.real(wofz((x + 1j *gamma)/sigma/np.sqrt(2))) / sigma\ 


/ np.sqrt(2*np.pi) 


alpha, gamma = 0.1, 0.1 
x = np.linspace(- 0.8,0.8,1000) 

pylab.plot(x, G(x, alpha), ls= , : / , c='k', label='Gaussian') 
pylab.plot(x, L(x, gamma), ls='--', c='k', label='Lorentzian') 
pylab.plot(x, V(x, alpha, gamma), c='k', label='Voigt ') 
pylab.legend() 
pylab.show() 


8.1.5 Fresnel integrals 

The Fresnel integrals are encountered in optics and are defined by the equations 



Both are retumed in a tuple for real or complex argument z by the special. scipy 
function fresnel (z). The related function, fresnel_zeros (nt), returns the first 
nt complex zeros of S(z) and C(z). 


Example E8.10 As well as playing an important role in the description of diffrac- 
tion effects in optics, the Fresnel integrals lind an application in the design of motor- 
way junctions (freeway intersections). The curve described by the parametric equations 
(x,y) = (5(t), C(0) is called a clothoid (or Euler spiral) and has the property that its 
curvature is proportional to the distance along the path of the curve. Hence, a vehicle 
traveling at constant speed will experience a constant rate of angular acceleration as it 
travels around the curve - this means that the driver can turn the steering wheel at a 
constant rate, which makes the junction safer. 

The following code plots the Euler spiral for— 10 < f < 10 (Figure 8.6). 

In [x]: import numpy as np 

In [x]: from scipy.special import fresnel 

In [x]: import pylab 

In [x]: t = np.linspace(-10, 10, 1000) 

In [x]: pylab.plot(*fresnel(t), c='k') 

In [x]: pylab.show() 


8.1.6 Binomial coefficients and exponential integrals 


The binomial coeflicient (”.) = n Q is returned by the scipy.special function 
binom(n,k). 
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Figure 8.6 The Euler spiral. 


Various functions are supplied for the evaluation of different forms of the exponential 
integrat. The Standard form is returned by expi (z): 

f z e 1 

Ei(z) = / — d t, |arg(—z) < n\ 

J — OO t 


expn (n, x) returns the value of 


/ 


OO g—Xt 


d t. 


'1 t n 

For n = 1, it is faster and more accurate to use expl (z): 


/ 


oo e -zt 


dt. 


Example E8.11 Any integrat of the form 

J f (z)e z dz, 

whcre f(z.) = P(z) / Q(z ) is a rational function, can reduced to the form 


/ S fe)^ + £/o^d z , 


where R{z) is a polynomial (which may be zero) by expansion in partiat fractions. 
The first integrat here can be evaluated by Standard methods (repeated integration by 
parts). Provided the path of integration does not pass through any singular points of the 
integrand, the second term can be written in terms of exponential integrals. 
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For example, consider the integral 

f~ 2 gZ 

i_ooZ 2 (z-l) Z ' 

It can easily be shown that 

i _ i 11 

z 2 (z- 1) z 1 z z 2 

and so the integral may be written as the three terms 



The second integral is simply —Ei(—2) and substitution u = z — 1 resolves the first 
integral to eEi(— 3). The last integral may be written in terms of En(z) or further reduced 
by integration by parts to 



2 


dz = 


+ Ei(—2). 


Therefore, 


I = eEi(—3) - 2Ei(—2) - 


In SciPy, 


In [x]: import numpy as np 

In [x]: from scipy.special import expi 

In [x]: np.e * expi(-3) - 2*expi(-2) - np.exp(-2)/2 

-0.00533 57974213484663 


8.1.7 


Orthogonal polynomials and spherical harmonics 


There are a large number of functions in scipy. special for the evaluation of 
different sorts of orthogonal polynomials, including the Legendre, Jacobi, Laguerre, 
Hermite and different flavors of Chebyshev polynomials. They take the general name 
eval_poly (n, x) where n is the order of the polynomial and x is an array-like 
sequence of values at which to evaluate the polynomial. Table 8.2 gives the names of 
some of these functions. 

The spherical harmonics used in SciPy are defined by the formula 


C(<M) 


( 2 n + 1 ) (n — m )! 


4jt 


(,n + m )! 


P™(coscP)e ime , 


where n = 0,1,2, • • • is called the degree and m = —n, — n + 1 ,■■■ n the order of the 
spherical harmonic. The functions P" 1 (x) are the associated Legendre polynomials. As 
with so many special functions, different fields adopt different phase conventions and 
normalizations, so it is important to check these carefully and make the appropriate 
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Table 8.2 Some of the orthogonal polynomials in SciPy 

Function 

Description 

eval legendre(n, x) 
eval chebyt(n, x) 
eval_chebyu(n, x) 
eval hermite(n, x) 
eval_jacobi(n, alpha, 
beta, x) 

eval_laguerre(n, x) 
eval genlaguerre(n, 
alpha x) 

Legendre polynomial, P n (. x ) 

Chebyshev polynomial of the first kind, T n (x) 

Chebyshev polynomial of the second kind, U n (x) 
(Physicists’) Hermite polynomial, H n {x) 

Jacobi polynomial, p\?'^\x) 

Laguerre polynomial of the first kind, L n {x) 

Generalized Laguerre polynomial of the first kind, L" (x) 


modifications when using them. In particular, many other fields use / for the degree 
of the harmonic and reverse the definition of 9 and 0. To be ciear, in SciPy 9 is the 
azimuthal (longitudinal) angle (taking values between 0 and 2n) and 0 is the polar 
(colatitudinal) angle (between 0 and n). 

The scipy. special. sph_harm method is called with the arguments: 

scipy.special.sph_harm(m, n, theta, phi) 

where theta and phi can be array-like objects. 


Example E8.12 Visualizing the spherical harmonics is a little tricky because they are 
complex and defined in terms of angular coordinates, (0,0). One way is to plot the real 
part only on the unit sphere. Matplotlib provides a toolkit for such 3D plots, mplot3d, 
as illustrated by the following code which produces Figure 8.7. 6 

Listing 8.7 The spherical harmonic defined by / = 3, m = 2 

# eg8-spherical-harmonics.py 


import matplotlib.pyplot as plt 
from matplotlib import cm, colors 
from mpl_toolkits.mplot3d import Axes3D 
import numpy as np 

from scipy.special import sph_harm 

phi = np.linspace(0, np.pi, 100) 
theta = np.linspace(0, 2*np.pi, 100) 
phi, theta = np.meshgrid(phi, theta) 


# The Cartesian coordinates of the unit sphere 
x = np.sin(phi) * np.cos(theta) 
y = np.sin(phi) * np.sin(theta) 
z = np.cos(phi) 


m, 1 = 2, 3 


^ See Section 7.2.3 and http://matplotlib.org/mpl_toolkits/mplot3d/. 
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Figure 8.7 A depiction of the spherical harmonic defined by 1 — 3, m — 2. 


# Calculate the spherical harmonic Y(l,m) and normalize to [0,1] 
fcolors = sph_harm(m, 1, theta, phi).real 

fmax, fmin = fcolors.max(), fcolors.min() 
fcolors = (fcolors - fmin)/(fmax - fmin) 

# Set the aspect ratio to 1 so our sphere looks spherical 
fig = plt.figure(figsize=plt.figaspect(1.)) 

ax = fig.add_subplot(111, projection=' 3d') 

ax.plot_surface(x, y, z, rstride=l, cstride=l, facecolors=cm.jet(fcolors)) 

# Tum off the axis planes 
ax.set_axis_off() 

plt.show() 


8.1.8 Exercises 
Questions 

Q8.1.1 By changing a single line in the program of Example E8.1, output the 10 most 
accurately known constants (excluding those set to their values by definition). 

Q8.1.2 Use SciPy’s constants and conversion factors to calculate the number den- 
sity, N/V, of ideal gas molecules at Standard temperature and pressure (T = 0 °C, 
p = 1 atm). The ideal gas law is pV = Nk^T. 


Problems 


P8.1.1 Use scipy. special. binom to create a depiction of PascaPs triangle of bino- 
mial coeflicients (”) iip to n = 8. 

P8.1.2 The Airy pattern is the circular diffraction pattern of resulting from a uniformly 
illuminated circular aperture. It consists of a bright, Central disc surrounded by fainter 
rings. Its mathematical description may be written in terms of the Bessel function of the 
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first kind, 


m = h 


2Ji(x) 

x 


2 


where 6 is the observation angle and x = ka sin 0. a is the aperture radius and k = 2n/X 
is the angular wavenumber of the light with wavelength X. 

Plot the Airy pattern as /(x) /Iq for —10 < x < 10 and deduce from the position of the 
first minimum in this function the maximum resolving power (in arcsec) of the human 
eye (pupil diameter 3 mm) at a wavelength of 500 nm. 


P8.1.3 Write a function, get_wv, which takes a molar bond dissociation energy, do, 
in kJ mol~ 1 and returns the wavelength of a photon corresponding to that energy per 
molecule, in nm. The energy of a photon with wavelength A is E = hc/X. 

For example, 

In [x]: get_wv(497) 

Out[x}: 240.69731528286377 


P8.1.4 An ellipsoid is the three-dimensional figure bounded by the surface described 
by the equation 


2 2 2 
xr y z 

-b — -1-= 1 

a 2 ^ b 2 ^ c 2 


where a, b and c are the semi-principal axes. If a = b = c, the ellipsoid is a sphere. The 
volume of an ellipsoid has a simple form, 


V 


4 

3 


nabc. 


There is no closed formula for the surface area of a general ellipsoid, but it may be 
expressed in terms of incomplete elliptic integrals of the first and second kinds, K(4>, k) 
and E(4>,k ): 


5 = 2?re 2 + 




sin 0 


\K{(f), k 2 ) cos 2 <p + E(4>, k 2 ) sin 2 (pj , 


where 


cos 4> — 


c 

a 


ay/b 1 — c 2 
by/a 2 — c 2 


and the coordinate system has been chosen such that a > b > c. 

Deline a function, ellipsoid surface, to calculate the surface area of a general 
ellipsoid, and compare the results for different-shaped ellipsoids with the following 
approximate formula: 


where 


S ~ 2 jzc 2 + 2nabr (1 

0 




b~ — c 2 
6 b 2 


-r 1 1 


3 b 2 + 10c 2 

56p 


r = 


sin cp 
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P8.1.5 The drawdown or change in hydraulic head, 5 (a measure of the water pressure 
above some geodetic datum), a distance r from a well at time t, from which water is 
being pumped at a constant rate, Q, can be modeled using the Theis equation, 

Q r 2 S 

s(r, t) = Hq- H(r, t) = ——W{u), where u = —. 

2nT \Tt 

Here Hq is the hydraulic head in the absence of the well, S is the aquifer storage 
coefficient (volume of water released per unit decrease in H per unit area) and T is 
the transmissivity (a measure of how much water is transported horizontally per unit 
time). The Well Function, W(u) is simply the exponential integral, E\(ii). 

For a well being pumped at Q = 1,000 m 3 day -1 from an aquifer described by the 
parameters Hq = 20 m, S = 0.0003, T = 1,000 m 2 day -1 , determine the height of the 
hydraulic head as a function of r after t = 1 day of pumping. 

Compare your answer with the approximate version of the Theis equation known as 
the Jacob equation, in which the well function is taken to be appoximately W (u) ~ 
— y — lnn where y = 0.577215664 • • • is the Euler-Mascheroni constant. 

P8.1.6 Some electronic components are cooled by annular fins (heatsinks) which con- 
duct heat away from the component and provide a larger surface area for that heat to 
dissipate to the surroundings. 

The cooling efficiency of an annular fin of width 2 w and inner and outer radii ty and 
r\ may be written in terms of modified Bessel functions of the first and second kinds: 

_ 2ro K\(uq)I\{ii\) — I\{uq)K\(ii\) 

1 fi(r\ — rjj) Kq(uq)I\(u\) + Iq{uq)K\{u\) ’ 

where uq = /Iro, «t = ftr\ and 

P = 

h c is the heat transfer coefficient (which is taken to be constant over the fin’s surface) 
and k is the thermal conductivity of the fin material. 

What is the cooling efficiency of an aluminium annular fin with dimensions 
ro = 5 mm, r\ = 10 mm, w = 0.1 mm? Take li c = 10 Wm -2 K -1 and 
k = 200 Wm“ ] K -1 . 

Calculate the heat dissipation, Q (the product of the efficiency, the fin area and the 
temperature difference) for a component temperature of To = 400 K and ambient 
temperature T e = 300 K. 



8.2 Integration and ordinary differential equations 

The scipy. integrate package contains functions for computing definite integrals. It 
can evaluate both proper (with finite limits) and improper (infinite limits) integrals. It 
can also perform numerical integration of Systems of ordinary differential equations. 
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8 . 2.1 


Definite integrals of a single variable 

The basic numerical integration routine is scipy. integrate . quad, which is based 
on the venerable FORTRAN 77 QUADPACK library. It uses adaptive quadrature to 
approximate the value of an integral by dividing its domain into subintervals that are 
chosen iteratively to meet a particular tolerance (that is, estimated absolute or relative 
error). In its simplest form, it takes three arguments: a Python function object corre- 
sponding to the function to integrate, fune, and the limits of integration, a and b. fune 
must take at least one argument; if it takes more than one it is integrated along the 
coordinate corresponding to the first argument. In simple usage, lambda expressions are 
a convenient way to define fune. For example, to evaluate f±x~ 2 dx = | numerically: 

In [x]: from scipy.integrate import quad 
In [x]: f = lambda x: l/x**2 
Out[x]: quad(f, 1, 4) 

(0.7500000000000002, 1.913234548258995e-09) 

quad returns two values in a tuple - the value of the integral and an estimate of the 
absolute error in the resuit. 

Use np. inf to evaluate improper integrals: 

In [x] : quaddambda x: np. exp (-x**2) , 0, np.inf) 

Out[x] : (0.8862269254527579, 7.101318390472462e-09) 

In [x]: np.sqrt(np.pi)/2 # analytical resuit 

Out[x]: 0.88622692545275794 


Note that in this call to quad we didn’t even give the function a name but simply passed 
it as an anonymous lambda object. 

More complicated functions require a Python function object defined with def: 


In [x] 


def 


g(x) : 

if abs(x) < 0.5: 

return -x 

return x - np.sign(x) 


In [x]: quad(g, -0.6, 0.8) 

Out[x]: (-0.06000000000000002, 6.661338147750941e-17) 


Functions with singularities or discontinuities can cause problems for the numerical 
quadrature routine even if the required integral is well-defined. For example, the sine 
function,/(x) = sin(v) jx has a removable singularity at x = 0, which causes the 
following simple application of quad to fail: 

In [x]: sine = lambda x: np.sin(x)/x 
In [x]: quad(sine, -2, 2) 

...: RuntimeWarning: invalid value encountered in double_scalars 
Out[37]: (nan, nan) 


The solution is to configure quad by passing a list of such brecik points to the points 
argument (the list does not have to be ordered): 

In [x] quad(sine, -2, 2, points=[0,]) 

(3.210825953 6053 89, 3.5647329017567276e-14) 
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Note that break points cannot be specified with infinite limits. 

The arguments epsrel and epsabs allow the specification of a desired accuracy of 
the quadrature as a relative or absolute tolerance. The default values are both l. 4 9 e - 8, 
but the integration can be done faster if a less-accurate answer is required. As an exam- 
ple, consider integrating the rapidly varying function, /(x) = e~^ sin 2 x 2 : 

In [x]: f = lambda x: np.sin(x**2)**2 * np.exp(-np.abs(x)) 

In [x]: quad(f, -1, 2, epsabs=0.1) 

Out[x]: (0.29551455828969975, 0.001529571827911671) 

In [x] : quad(f, -1, 2, epsabs=l.49e-8) # (the default absolute tolerance) 

Out[x]: (0.29551455505239044, 4.449763315720537e-10) 

Note that epsabs is only a requested upper bound: the actual estimated accuracy in the 
resuit may be much better, and in fact the actual resuit may be more accurate than this 
estimate. 

If a function takes one or more parameters in addition to its principal argument, these 
need to be passed to quad as a tuple in args. For example, the integral 



can be evaluated numerically with 


In [x]: def f(x, n, m): 


return np.sin(x)**n * np.cos(x)**m 


In [x] : n, m = 2, 1 

In [x]: quad(f, -np.pi/2, np.pi/2, args=(n, m)) 

(0.6666666666666666, 1.625746841018571e-13) 

Note that the additional parameters, n and m here, appear as arguments to our function 
after the coordinate to be integrated over (x). 


Example E8.13 Consider a torus of average radius R and cross-sectional radius r. The 
volume of this shape may be evaluated analytically in Cartesian coordinates as a volume 
of revolution: 



The center of the torus is at the origin and the z axis is taken to be its symmetry axis. 

The integral is tedious but yields to Standard methods: V = 2tt 1 Ri 1 . Here we take a 
numerical approach with the values R = 4, r = 1: 

In [x] : R, r = 4, 1 

In [x]: f = lambda x, R, r: x * np.sqrt(r**2 - (x-R)**2) 

In [x]: V, _ = quad(f, R-r, R+r, args=(R, r)) 

In [x]: V *= 4 * np.pi 

In [x]: Vexact = 2 * np.pi**2 * R * r**2 

In [x]: print('V = {} (exact: {})'.format(V, Vexact)) 

Out[x]: V = 78.95683520871499 (exact: 78.95683520871486) 
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8.2.2 Integrals of two and more variables 

The scipy. integrate functions dblquad, tplquad and nquad evaluate double, 
triple and multiple integrals respectively. Because, in general, the limits on one coor- 
dinate may depend on another coordinate, the syntax for calling these functions is a 
little more complicated. 


dblquad evaluates the double integral: 



It is passed f(x,y) as a function of at least two variables, fune (y, x, . . .). The 
function must take y as its first argument and x as its second argument. The integral 
limits are passed to dblquad in four further arguments. First, the two arguments, a and 
b, specify the lower and upper limits on the x-integral respectively, as for quad. The 
next two arguments, gfun and hfun, are the lower and upper limits on the y-integral 
and they must be callable objects taking a single floating point argument, the value of x 
at which the limit applies (i.e., they must themselves be functions of x). If either of the 
y-integral limits does not depend on x, gfun or hfun can retum a constant value. 

As a simple example, the integral 



can be evaluated with 

In [x]: f = lambda y, x: x**2 * y 

In [x] : a, b = 1, 4 

In [x]: gfun = lambda x: 0 

In [x]: hfun = lambda x: 2 

In [x]: dblquad(f, a, b, gfun, hfun) 

Out[x] : (42.00000000000001, 4.662936703425658e-13) 

Here, gfun and hfun are each called with a value of x, but they return a constant 
(o and 2 respectively) no matter what this value is. 

Of course, it is possible to wrap ali of this into a single line: 

In [x]: dblquad( lambda y, x: x**2 * y, 1, 4, lambda x: 0, lambda x: 2) 

Out[x]: (42.00000000000001, 4.662936703425658e-13) 

A double integral can be used to find the area of some two-dimensional shape 
bounded by one or more functions. For an example in polar coordinates, consider the 
area inside the curve r = 2 + 2 sin 6 but outside the circle defined by r = 2 for 9 in 
[0, 2 tt] (see Figure 8.8). These curves intersect at 6 = 0, n so the required integral is 



where rdrdO is the infinitesimal area element in polar coordinates. This particular 
integral is fairly straightforward to evaluate analytically (A = 8 + n), so the numerical 
resuit is easy to check: 
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Figure 8.8 The region defined as the area inside r = 2 + 2 sin 9 but outside the circle r = 2. 


In [x]: rl, r2 = lambda theta: 2, lambda theta: 2 + 2*np.sin(theta) 

In [x]: A, _ = dblquad(lambda r, theta: r, 0, np.pi, rl, r2) 

Out[x]: 11.141592653589791 

In [x]: 8 + np.pi # exact answer 

Out[x]: 11.141592653589793 

The function to evaluate is simply r, defined by lambda r, theta: r; in the inner 
integrat the limits on r are 2 and 2 + 2 sin 0 ; for the outer integrat 6 ranges from 0 to jt . 

The method tplquad evaluates triple integrals and takes a function of three vari- 
ables, fune (z, y, x) and six further arguments: constant x-limits, a and b, y-limits 
gfun(x) and hfun(x) (which are functions, as for dblquad, and z- limits qfun(x, 
y) and rfun (x, y) (functions of x and y in that order). 

Higher dimensional integrations are handled by the scipy. integrate.nquad 
method which will not be discussed here (documentation and examples are available 
online). 7 


Example E8.14 The volume of the unit sphere, 4 jt/ 3, can be expressed as a triple 
integral in spherical polar coordinates with constant limits: 



r 2 sinit drdddcp. 


In [x]: from scipy.integrate import tplquad 

In [x]: tplquad( lambda phi, theta, r: r**2 * np.sin(theta), 

0 , 1 , 

lambda theta: 0, lambda theta: np.pi, 


7 http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.nquad.html/scipy.integrate.nquad. 
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lambda theta, phi: 0, lambda theta, phi: 2*np.pi) 
Out[x]: (4.18879020478639, 4.650491330678174e-14) 


Or in Cartesian coordinates with limits as functions: 


» dz dy dx. 


1 p y/ 1 — X 2 p y /1 —x 2 —y 2 


0 J 0 J 0 


where the integral is in the positive octant of the three-dimensonal Cartesian axes. 

In [x]: A, _ = tplquad (lambda z, y, x: 1, 


0 , 1 , 

lambda x: 0, lambda x: np.sqrt(l - x**2), 

lambda x,y: 0, lambda x,y: np.sqrt(l - x**2 - y**2)) 


In [x]: 8*A 

Out[x]: 4.188790204786391 


Example E8.15 This example finds the mass and center of mass of the tetrahedron 
bounded by the coordinate axes and the plane x + y + z = 1 with density p = p(x, y, z ) 
where p(x,y,z ) is provided as a lambda function. We test it with the functions p = 1, 
p = x and p = x 2 + y 2 + z 2 - 


The mass may be written as a triple integral of the density over the volume of the 
tetrahedron: 



and the coordinates of the center of mass are given by 



The following program uses scipy. integrate. tplquad to perform the necessary 
integrations (which can also be solved analytically). 

Listing 8.8 Calculating the mass and center of mass of a tetrahedron given three different densities 

# eg8-tetrahedron-cofm.py 
import numpy as np 

from scipy.integrate import tplquad 

# The integration limits on x, y, z: 
a, b = 0, 1 

gfun, hfun = lambda x: 0, lambda x: 1 - x 
qfun, rfun = lambda x, y: 0, lambda x, y: 1 - x - y 
O lims = (a, b, gfun, hfun, qfun, rfun) 

# The three different density functions 
rhos = [lambda x, y, z: 1, 


lambda x, y, z: x, 

lambda x, y, z: x**2 + y**2 + z**2] 
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for rho in rhos: 

# The mass as a triple integral of rho over the volume 
m, _ = tplquad(rho, *lims) 

# The center of mass (xbar, ybar, zbar) 

mxbar, _ = tplquad( lambda x, y, z: x * rho(x,y / z) / *lims) 

mybar, _ = tplquad( lambda x, y, z: y * rho(x,y,z), *lims) 

mzbar, _ = tplquad( lambda x, y, z: z * rho(x,y,z), *lims) 

xbar, ybar, zbar = mxbar / m, mybar / m, mzbar / m 

print('mass = {:g}, CofM = ({:g}, {:g}, {:g})'.format(m, xbar, ybar, zbar)) 

O Note that the six arguments representing the limits on the triple integral (two con- 
stants and two pairs of lambda functions) have been packed into a tuple, lims (the 
parentheses are optional here). 

The output is: 

mass = 0.166667, CofM = (0.25, 0.25, 0.25) 

mass = 0.0416667, CofM = (0.4, 0.2, 0.2) 

mass = 0.05, CofM = (0.277778, 0.277778, 0.277778) 


8.2.3 Ordinary differential equations 

Ordinary differential equations can be solved numerically with scipy. integrate. 
odeint. This function is based on the well-tested Fortran LSODA routine, which can 
automatically switch between stiff and nonstiff algorithms. 8 odeint solves first-order 
differential equations - to solve a higher-order equation, it must be decomposed into a 
system of first-order equations first, as explained later. 


A single first-order ordinary differential equation 

In its simplest use for the solution of a single first-order ordinary differential equation, 

dy 

- 7 - =f(y,t), 

at 

odeint takes three arguments: a function object retuming dy/df, an initial condition, 
yo, and a sequence of t values at which to calculate the solution, y(t). 

For example, consider the first-order differential equation describing the rate of the 
reaction A -» P in terms of the concentration of the reactant, A: 


d[A] 
d t 


= -*[A]. 


This example has an easily obtainable analytical solution: 


[A] = [ A] 0 e ~ kt , 


where [A]o is the initial concentration of [A]. 


A differential equation is said to be stiff if a numerical method is required to take excessively small steps in 
its intervals of integration in relation to the smoothness of the exact underlying solution. 
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Figure 8.9 Exponential decay of a reactant in a first-order reaction: numerical and exact Solutions. 


To solve the equation numerically with odeint, write it in the form as shown above, 
with a single dependent variable, y(t) = [A], which is a function of the independent 
variable, t (time). We have: 



We need to provide a function returning dy/df as/(y, t ) (in general a function of both y 
and t), an initial condition, y(0) and a sequence of time points upon which to calculate 
the solution. The derivative function is simply: 

def dydt(y, t): 

return -k * y 

(the order of the arguments is important). A program comparing the numerical and 
analytical results for a reaction with k = 0.2 s _ 1 and y(0) = [A]o = 100 is given later; 
the resulting plot is Figure 8.9. 

Listing 8.9 First-order reaction kinetics 

import numpy as np 

from scipy.integrate import odeint 
import pylab 

# First-order reaction rate constant, s-1 
k = 0.2 

# Initial condition on y: 100% of reactant is present at t=0 
yO = 100 

# A suitable grid of time points for the reaction 
t = np.linspace(0, 20, 20) 
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def dydt(y, t): 

""" Return dy/dt = f(y,t) at time t. """ 

return -k * y 

# Integrate the differential equation 
y = odeint(dydt, yO, t) 

# Piot and compare the numerical and exact Solutions 
pylab.plot(t, y, 'o ’, color='k', label=r' \texttt {odeint}') 
pylab.plot(t, yO * np.exp(-k*t), color='gray', label='Exact') 
pylab.xlabel(r'$t\;/\mathrm{s}$') 

pylab.ylabel('Remaining reactant (\%)') 
pylab.legend() 
pylab.show() 


As with the quad family of routines, if the function returning the derivative requires 
further arguments, they can be passed to odeint in the args parameter. In the earlier 
mentioned example, k is resolved in global scope, but we could pass it with: 

def dydt(y, t, k): 

return -k * y 

(note that additional parameters must appear after the dependent and independent vari- 
ables). The call to odeint would then be: 

y = odeint(dydt, yO, t, args=(k,)) 


Coupled first-order ordinary differential equations 

odeint can also solve a set of coupled first-order differential equations in more than 
one dependent variable: y i (!), v ’2 U), ■ ■ ■ ,y n (t)' 


dyi 
d t 


=fi(yi,y2, ■ ■ ■ ,y n -,t ) 


dy 2 

-y- =f 2 (yuy 2 , ■ ■ ■ ,y n ;0 

at 


d y_n_ 

d t 


=fniy\,y 2 , ■ ■ ■ ,y n ',t ) 


In this case, the function passed to odeint () must retum a sequence of derivatives, 
dy i /d/, d\' 2 /d/, ■ ■ ■ , dy„/d/ for each of the dependent variables; that is, it evaluates 
the earlier mentioned functions J](y\, >' 2 , • ■ • , y „; t ) for each of the y, passed to it in a 
sequence, y. The form of this function is: 

def deriv(y, t): 

# y = [yl, y2, y3, ...] is a sequence of dependent variables 

dyldt = f1(y, t) # calculate dyl/dt as fl(yl,y2,...,yn;t) 

dy2dt = f2(y, t) # calculate dy2/dt as f2(yl / y2,...,yn;t) 

# - -. etc 

# Return the derivatives in a sequence such as a tuple: 
return dyldt, dy2dt, ..., dyndt 
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Figure 8.10 Two coupled first-order reactions: numerical and exact Solutions. 


For a concrete example, suppose a reaction proceeds via two first-order reaction steps: 
A -> B —»• P with rate constants k\ and k 2 . The equations governing the rate of change 
of A and B are 


d[A] 

dr 


—k\ [A] 


d| B1 

dr 


= k\ [A] 


k 2 [ B] 


Again, we can solve this pair of coupled equations analytically, but in our numerical 
solution, letyi = [A] andy 2 = [B]: 


dyi 
d t 
d,V’2 
d t 


= -hyi 


= hyi - k 2 y 2 


The code mentioned here integrates these equations for k\ = 0.2 s U 2 = 0.8 s 1 
and initial conditions yi (0) = 100, V 2 (0) = 0, and compares with the analytical resuit 
(Figure 8.10). 


Listing 8.10 Two coupled first-order reactions 

import numpy as np 

from scipy.integrate import odeint 
import pylab 

# First-order reaction rate constants, s-1 
kl, k2 = 0.2, 0.8 

# Initial condit ion on yl, y2: [A] (t=0) = 100, [B] (t=0) = 0 
A0, B0 = 100, 0 
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# A suitable grid of time points for the reaction 
t = np.linspace(0, 20, 100) 

def dydt(y, t, kl, k2) : 

""" Return dy_i/dt = f(y_i, t) at time t. """ 
yl, y2 = y 
dyldt = -kl * yl 
dy2dt = kl * yl - k2 * y2 
return dyldt, dy2dt 

# Integrate the differential equation 
yO = A0, B0 

O yl, y2 = odeint(dydt, yO, t, args=(kl, k2)).T 
A, B = yl, y2 

# [P] is determined by conservation 
P = A0 - A - B 

# Analytical resuit 
Aexact = A0 * np.exp(-kl*t) 

Bexact = A0 * kl/(k2-kl) * (np.exp(-kl*t) - np.exp(-k2*t)) 
Pexact = A0 - Aexact - Bexact 

pylab.plot(t. A, ' o' , label='[A]') 

pylab.plot(t, B, ' A/ , label= / [B] / ) 

pylab.plot(t, P, 'd', label='[P]') 

pylab.plot(t, Aexact) 

pylab.plot(t, Bexact) 

pylab.plot(t, Pexact) 

pylab.xlabel(r'$t\;/\mathrm{s}$') 

pylab.ylabel('Concentration (arb. units)') 

pylab.legend() 

pylab.Show() 


O Note that odeint retums a two-dimensional array with the values of each dependent 
variable in the rows: if we want to unpack this array to separate one-dimensional arrays, 
yl, y2, and so on, we need the transpose of this returned array. 

A single second-order ordinary differential equation 

To solve an ordinary differential equation of higher than first order, it must first be 
reduced into a system of first-order differential equations. In general, any differential 
equation with a single dependent variable of order n can be written as a system of n 
first-order differential equations in n dependent variables. 

For example, the equation of motion for a harmonic oscillator is a second-order 
differential equation: 


d 2 x 

dr 2 


= — co~ 


x. 


where x is the displacement from equilibrium and oo is the angular frequency. This 
equation may be decomposed into two first-order equations as follows: 
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dx 2 

dt 2 


-C0 2 Xl, 


where x\ is identified with x and x 2 with dx/dt. 

This pair of coupled first-order equations may be solved as before: 


Listing 8.11 Solutiori of the harmonic oscillator equation of motion 

import numpy as np 

from scipy.integrate import odeint 
import pylab 

# Harmonic oscillator frequency (s-1) 
omega = 0.9 

# initial conditions on xl-x and x2=dx/dt at t-0 

A, vO = 3, 0 # cm, cm.s-1 

xO = A, vO 

# A suitable grid of time points 
t = np.linspace(0, 20, 100) 

def dxdt(x, t, omega): 

""" Return dx/dt = f(x,t) at time t. """ 
xl, x2 = x 
dxldt = x2 

dx2dt = -omega**2 * xl 
return dxldt, dx2dt 

# Integrate the differential equation 

xl, x2 = odeint(dxdt, xO, t, args=(omega,)).T 

# Piot and compare the numerical and exact Solutions 

pylab.plot(t, xl, 'o', color= , k , / label=r 7 \texttt {odeint()} 7 ) 

pylab.plot(t, A * np.cos(omega * t), color='gray 7 , label= 7 Exact') 

pylab.xlabel(r 7 $t\;/\mathrm{s }$ 7 ) 

pylab.ylabel(r 7 $x\;/\mathrm{cm}$ 7 ) 

pylab.legend() 

pylab.show() 


The plot produced by this code is given in Figure 8.10. 

The odeint function is a simplified interface to the more advanced scipy. 
integrate. ode method which provides a range of different numerical integrators, 
including Runge-Kutta algorithms and support for complex-valued variables. 


Example E8.16 An object falling slowly in a viscous fluid under the influence of grav- 
ity is subject to a drag force (Stokes dmg), which varies linearly with its velocity. Its 
equation of motion may be written as the second-order differential equation: 


d 2 z dz 

m — y = -c— + mg , 
d t- d( 
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Figure 8.11 The harmonic oscillator: numerical and exact Solutions. 


where z is the object’s position as a function of time, t, c is a drag constant which 
depends on the shape of the object and the fluid viscosity and 



is the effective gravitational acceleration, which accounts for the buoyant force due to 
the fluid (density pfl u id) displaced by the object (density p 0 bj)- For a small sphere of 
radius r in a fluid of viscosity r /, Stokes’ law predicts c = 6 tt rjr. 

Consider a sphere of platinum (p = 21.45 gcm -3 ) with radius 1 mm, initially at 
rest, falling in mercury (p = 13.53 gcm -3 , r/ = 1.53 x 10 -3 Pas). The earlier 
mentioned second-order differential equation can be solved analytically, but to integrate 
it numerically using odeint, it must be treated as two first-order ordinary differential 
equations: 



In the code mentioned here, the function deriv calculates these derivatives and is 
passed to odeint with the intial conditions (z = 0, z = 0) and a grid of time points. 

Listing 8.12 Calculating the motion of a sphere falling under the influence of gravity and Stokes drag 

# eg8-stokes-drag.py 
import numpy as np 

from scipy.integrate import odeint 
import pylab 

# Pt sphere falling from rest in mercury 
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# Acceleration due to gravity (m.s-2) 
g = 9.81 

# Densities (kg.m-3) 
rho_Pt, rho_Hg = 21450, 13530 

# Viscosity of Hg (Pa.s) 
eta = 1.53e-3 

# Radius and mass of the sphere 

r = l.e-3 # radius (m) 

m = 4*np.pi/3 * r**3 * rho_Pt 

# Drag consant from Stokes' Law: 
c = 6 * np.pi * eta * r 

# Effective gravitational acceleration 
gp = g * (1 - rho_Hg/rho_Pt) 

def deriv(z, t, m, c, gp): 

""" Re tum the dz/dt and d2z/dt2. """ 
dzO = z [1] 

dzl = gp - c/m * z[l] 
return dzO, dzl 

t = np.linspace(0, 20, 50) 

# Initial conditions: z = 0, dz/dt = 0 at t=0 
zO = (0, 0) 

# Integrate the pair of differential equations 

z, zdot = odeint(deriv, zO, t, args=(m, c, gp)).T 
pylab.plot(t, zdot) 

print ('Estimate of terminal velocity = { : . 3f} m. s-1'.format(zdot[-1] ) ) 

# Exact solution: terminal velocity vt (m.s-1) and characteristic time tau (s) 
vO, vt, tau = 0, m*gp/c, m/c 

print ('Exact terminal velocity = {:.3f} m.s-1format(vt)) 

z = vt *t + v0*tau*(1-np.exp(-t/tau)) + vt*tau*(np.exp(-t/tau)-1) 

zdot_exact = vt + (vO-vt)*np.exp(-t/tau) 

pylab.plot(t, zdot_exact) 

pylab.xlabel( 7 $t$ /s') 

pylab.ylabel(' $\dot{z}\;/\mathrm{m\, s A {-l }}$') 
pylab.show() 


The plot produced by this program is shown in Figure 8.12: the numerical and ana- 
lytical results are indistinguishable at this scale but are reported to three decimal places 
in the output: 

Estimate of terminal velocity = 11.266 m.s-1 
Exact terminal velocity = 11.285 m.s-1 


8.2.4 Exercises 
Questions 

Q8.2.1 Use scipy. integrate . quad to evaluate the following integral: 
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Figure 8.12 The velocity of a platinum sphere falling in mercury as a function of time, modeled 
with Stokes’ law. 


f 


UJ 


X 

L2J 


dx. 


Q8.2.2 Use scipy. integrate. quad to evaluate the following definite integrals 
(most of which can also be expressed in closed form over the range given but are 
awkward). 


a. 


b. 


c. 


d. 


r 1 x 4 (t -x) 4 
o l+x 2 


dx. 


(Compare with 22/7 — j r.) 

The following integrat appears in the Debye theory of the heat capacity of crystals 
at low temperature 


L 


OO 



dx. 


(Compare with 7r 4 /15.) 

The integrat sometimes known as the Sophomore’s dream : 


L 


l 


dx 


(Compare the value you obtain from the summation n ”•) 


/' 


[ln(l/x)f 


dx 
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e. 


(Compare with p\ for integer 0 < p < 10.) 


phT 

Jo 


„z cos 0 


dd 


(Compare with Iq(z)/2jt, where !q(z) is a modified Bessel function of the first 
kind, for 0 < z < 2.) 


Q8.2.3 Use scipy. integrate . dblquad to evaluate n by integration of the constant 
function/(x, y) = 4 over the quarter circle with unit radius in the quadrant x > 0, y > 0. 

Q8.2.4 What is wrong with the following attempt to calculate the area of the unit circle 
(tt) as a double integral in polar coordinates? 

In [x]: dblquad(lambda r, theta: r, 0, 1, lambda r: 0, lambda r: 2*np.pi) 

Out[x]: (19.739208802178712, 2.1914924100062363e-13) 


Problems 

P8.2.1 The area of the surface of revolution about the x-axis between a and b of the 
function y = /(x) is given by the integral 


S = 2tt ( y ds, 


where ds = 




2 

dx. 


Use this equation to write a function to determine the surface area of revolution of a 
function y = /(x) about the x-axis, given Python function objects that return y and 
dy/dx, and test it for the paraboloid obtained by rotation of the function/(x) = J~x about 
the x-axis between a — 0 and b = 1. Compare with the exact resuit, jt (5 3 / 2 — 1)/6. 


P8.2.2 The integral of the secant function, 

f 9 

/ sec 4> d 4> 

Jo 

for— tt/2 < 9 < n /2 is, important in navigation and the theory of map projections. It 
can be expressed in closed form as the inverse Gudermannian function, 

gd~' (6) = ln | secd + tan0|. 

Use scipy. integrate . quad to calculate values for the integral across the relevant 
range for 9 given earlier and compare graphically with the exact answer. 


P8.2.3 Consider a torus of uniform density, unit mass, average radius R and cross- 
sectional radius r. The volume and moments of inertia of such a torus may be evaluated 
analytically and give the results: 

V = 2 7z 2 Rr 2 , 

l = r 2 + y, 
i x = i y = \r 2 + y, 
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where the center of mass of the torus is at the origin and the z axis is taken to be its 
symmetry axis. 

Here we take a numerical approach. In cylindrical coordinates ( p,0,z ), it may be 
shown that: 

r2u rR+r ny/ r 2 — (p—R) 2 

V = 2 / / p dzdpdO, 

J 0 JR-r J 0 

2 r2n rR+r r y/r 2 -(p-R) 2 

h = - / / / p 3 dzdpdO, 

V J 0 JR-t- J 0 

2 r 2tz rR+r r ^/r 2 -(p-R) 2 

h = I y = — / / / (p 2 sin 2 9 + z 2 )p dzdp d0. 

V J 0 JR-r J 0 


Evaluate these integrals for the torus with dimensions /? = 4, r = 1 and compare 
with the exact values. 

P8.2.4 The Brusselator is a theoretical model for an autocatalytic reaction. It assumes 
the following reaction sequence, in which species A and B are taken to be in excess 
with constant concentration and species D and E are removed as they are produced. 
The concentrations of species X and Y can show oscillatory behavior under certain 
conditions. 


A - 

* X 


2X +Y - 

> 3X 

k 2 

B + X- 

* Y + D 

kj, 

X - 

» E 

ki 


It is convenient to introduce the scaled quantities 


I k 2 


k 2 


x = [X] / -f-, y = [Y] / y 2 . 


a = [A] 


/4 

k\ /ki 


k/\ 


, b = [B] 


h 


k^ V ki ki 

and to scale the time by the factor ki, which gives rise to the dimensionless equations 


1X1 9 

— = a - (1 + b)x + xry, 
dt 

dy 7 

— = bx + x 2 y. 

dt 

Show how these equations predict x and y to vary For (a) a = 1 ,b = 1.8 and (b) 
a = \,b = 2.02 by plotting in each case (i) x,y as functions of (dimensionless) time 
and (ii) y as a function of x. 
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P8.2.5 The equation governing the motion of a pendulum consisting of a mass at the 
end of a light, rigid rod of length 1 may be written 


d 2 0 
d t 2 


g 

1 


sin$, 


where 9 is the angle the pendulum makes with the vertical. 

Taking Z = 1 m and g = 9.81 ms'', determine the subsequent motion of the 
pendulum if it is started at rest with an initial angle Oo = 30°. Compare the motion with 
the harmonic approximation reached by assuming 0 is small, which has the analytical 
solution 9 = 0q cos (cot) with a> = ■s/gjl. 


P8.2.6 A simple mechanism for the formation of ozone in the stratosphere consists of 
the following four reactions (known as the Chapman cycle): 


0 2 + hv - 

* 20 


ki = 3 x IO -12 s -1 

0 2 + 0 + M - 

> 0 3 + M 

k 2 

= 1.2 x 1(T 33 cm 6 molec -2 s“ 1 

0 3 + hv' - 

> 0 + 0 2 


k 3 = 5.5 x 1CT 4 s" 1 

0 + 0 3 - 

* 20? 

*4 

= 6.9 x 10“ 16 cnr’ molec^ 1 s” 1 


where M is a nonreacting third body taken to be at the total air molecule concentration 
for the altitude being considered. The earlier mentioned reactions lead to the following 
rate equations for [O], [O 3 ] and [O 2 ]: 


d[ 0 2 ] 

-V 1 = —*t[0 2 ] - * 2 [0 2 ][0][M] + k 3 [0 3 ] + 2k 4 [0][0 3 ] 
d t 

d[ 0 ] 

-V = 2fci[02] - * 2 [0 2 ][0][M] + k 3 [0 3 ] - *4[0][0 3 ] 
df 

d[ 0 3 ] 

= *2[0 2 ][0][M] - k 3 [0 3 ] - * 4 [0][0 3 ] 


The rate constants apply at an altitude of 25 km, where [M] = 9 x 10 17 molec cm _ 
Write a program to determine the concentrations of 0 3 and O as a function of time at 
this altitude (you should find the [0 2 ] remains pretty much constant). Start with initial 
conditions [O 2 ]o = 0.21[M], [O]o = [O 3 ]o = 0 and integrate for 10 8 s (starting from 
scratch it takes about three years to build an ozone layer with this mechanism). Compare 
the equilibrium concentrations with the approximate analytical resuit obtained using the 
steady-state approximation: 


[ 0 3 ] = 



[0 2 ][M]2, 


[ 0 ] 

[ 0 3 ] 


h 

* 2 [0 2 ][M]' 


P8.2.7 Hyperion is an irregularly shaped moon of Satum notable for its chaotic rota- 
tion. Its motion may be modeled as follows. 

The orbit of Hyperion (H) about Saturn (S) is an ellipse with semi-major axis, a, and 
eccentricity, e. Let its point of closest approach ( periapsis ) be P. Its distance from the 
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planet, SH, as a function of its true anomaly (orbital angle, </>, measured from the line 
SP) is therefore 

a( 1 — e 2 ) 
r = -. 

1 + e cos 4> 

Define the angle 9 to be that between the axis of the smallest principal moment of 
inertia (loosely, the longest axis of the moon) and SP, and the quantity £2 to be a scaled 
rate of change of 9 with <p (i.e., the rate at which Hyperion spins as it orbits Saturn) as 
follows: 

a 2 d 9 

£2 = -f — . 

r~ d (p 



Now, it can be shown that 
d£2 _ 
d0 


B-A 


— sin[2 (0-0)], 


C 2(1 - e 2 ) r 

where A, B and C are the principal moments of inertia. 

Use scipy. integrate. odeint to find and plot the spin rate, £2, as a function of 4> 
for the initial conditions (a) 9 = £2 = 0 at 4> = 0, and (b) 9 = 0, £2 = 2 at 4> = 0. Take 
e = 0.1 and (5 - A)/C = 0.265. 

P8.2.8 The radioactive decay chain of 212 Pb to the stable isotope 208 Pb may be con- 
sidered as the following sequence of steps with the given rate constants, kj\ 


212 


Pb 


212 


212 


212 


208 


Bi 

Bi 

T1 


212 


Po 


208 


212 


208 


208 


Bi T 
Tl + a 
Po + /U 
Pb + fS~ 
Pb + a 


k\ = 1.816 x 10“ 5 s-\ 
k 2 = 6.931 x 10“ 5 s- 1 , 
k 3 = 1.232 x 10“ 4 s" 1 , 
U = 3.851 x 10“ 3 s -1 , 


k 5 = 2.310 s 


-i 
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By considering the following first-order differential equations giving the rates of 
change for each species, plot their concentrations as a function of time. 


d[ 212 Pb] 2]2 

d t 

d[ 212 Bi] f r212 


d t 

d[ 208 Tl] 
d t 

d[ 212 Po] f r212 


= /ti[ zlz Pb] - k 2 [ 212 Bi] - k 3 [ 212 Bi] 


= /r 2 [ 212 Bi] — /c 4 [ 208 T1] 


d t 

d[ 208 Pb] 
d t 


= £ 3 [ zlz Bi]-£ 5 [ 212 Po] 
= /t 4 [ 208 Tl] + /t 5 [ 212 Po] 


If all the intermediate species, J, are treated in “steady state” (i.e., d|J ]/d/ = 0, the 
approximate expression for the 208 Pb concentration as a function of time is 

[ 208 Pb] = [ 212 Pb] 0 (l - e~ k A. 

Compare the “exact” resuit obtained by numerical integration of the differential equa¬ 
tions with this approximate answer. 


8.3 Interpolation 

The package scipy. interpolate contains a large variety of functions and classes for 
interpolation and splines in one and more dimensions. Some of the more important are 
described in this section. 


8.3.1 Univariate interpolation 

The most straightforward one-dimensional interpolation functionality is provided by 
scipy. interpolate. interpld. Given arrays of points x and y, a function is 
returned, which can be called to generate interpolated values at intermediate values of 
x. The default interpolation scheme is linear, but other options (see Table 8.3) allow for 
different schemes, as shown in the following example. 


Example E8.17 This example demonstrates some of the different interpolation meth- 
ods available in scipy. interpolation. interpld (see Figure 8.13). 

Listing 8.13 A comparison of one-dimensional interpolation types using 

scipy.interpolate.interpld 

# eg8-interpld.py 
import numpy as np 

from scipy.interpolate import interpld 
import pylab 
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Table 8.3 Interpolation methods specified by the kind argument to 

scipy.interpolate.interpld 


kind 

Description 

'linear' 

The default, linear interpolation using only the values from the original data 
arrays bracketing the desired point 

'nearest' 

"Snap" to the nearest data point 

'zero' 

A zeroth-order spline: interpolates to the last value seen in its traversal of the 
data arrays 

'slinear' 

First-order spline interpolation (in practice, the same as ' linear') 

'quadratio' 

Second-order spline interpolation 

'cubic' 

Cubic spline interpolation 



Figure 8.13 An illustration of different one-dimensional interpolation methods with 
scipy.interpolation.interpld. 


A, nu, k = 10, 4, 2 
def f (x, A, nu, k) : 

return A * np.exp(-k*x) * np.cos(2*np.pi * nu * x) 

xmax, nx = 0.5, 8 
x = np.linspace(0, xmax, nx) 
y = f (x, A, nu, k) 

f_nearest = interpld(x, y, kind= 7 nearest') 

f_linear = interpld(x, y) 

f_cubic = interpld(x, y, kind= 7 cubic 7 ) 

x2 = np.linspace(0, xmax, 100) 
pylab.plot(x, y, 7 o 7 , label= 7 data points 7 ) 
pylab.plot(x2, f(x2. A, nu, k), label= 7 exact 7 ) 
pylab.plot(x2, f_nearest(x2), label= 7 nearest 7 ) 
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pylab.plot(x2, f_linear(x2), label='linear ') 
pylab.plot(x2, f_cubic(x2) / label='cubic') 
pylab.legend() 
pylab.show() 


8.3.2 Multivariate interpolation 

We shall consider two kinds of multivariate interpolation corresponding to whether or 
not the source data are structured (arranged on some kind of grid) or not. 


Interpolation from a rectangular grid 

The simplest two-dimensional interpolation routine is scipy. interpolate. 
interp2d. It requires a two-dimensional array of values, z, and the two (one- 
dimensional) coordinate arrays x and y to which they correspond. These arrays need 
not have constant spacing. Three kinds of interpolation spline are supported through the 
kind argument: 'linear' (the default), 'cubic' and 'quintic'. 


Example E8.18 In the following example, we calculate the function 



on a grid of points (x, y) which is not evenly spaced in the y-direction. We then use 
scipy. interpolate. interp2d to interpolate these values onto a finer, evenly 
spaced (x, y) grid: see Figure 8.14. 

Listing 8.14 Two-dimensional interpolation with scipy. interpolate. interp2d 

# eg8-interp2d.py 
import numpy as np 

from scipy.interpolate import interp2d 
import matplotlib.pyplot as plt 

x = np.linspace(0, 4, 13) 

y = np.array( [0, 2, 3, 3.5, 3.75, 3.875, 3.9375, 4]) 

X, Y = np.meshgrid(x, y) 

Z = np.sin(np.pi*X/2) * np.exp(Y/2) 

x2 = np.linspace(0, 4, 65) 
y2 = np.linspace(0, 4, 65) 

O f = interp2d(x, y, Z, kind='cubic') 

Z2 = f(x2, y2) 

fig, ax = plt.subplots(nrows=l, ncols=2) 
ax[0].pcolormesh(X, Y, Z) 

X2, Y2 = np.meshgrid(x2, y2) 
ax[l].pcolormesh(X2, Y2, Z2) 

plt.show() 

O Note that interp2d requires the one-dimensional arrays, x and y. 
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Figure 8.14 Two-dimensional interpolation with scipy. interpolate . interp2d. 


If the mesh of (x, y) coordinates form a regularly spaced grid, the fastest way to inter¬ 
polate values from values of z is to use a scipy. interpolate. RectBivariateSpline 
object as in the following example. 


Example E8.19 In the following code, the function 

z(x,y) = e-^e ^ 2/4 

is calculated on a regular, coarse grid and then interpolated onto a finer one (Figure 
8.15). 

Listing 8.15 Interpolation onto a regular two-dimensional grid with 

scipy.interpolate.RectBivariateSpline 

# eg8-RectBivariateSpline.py 
import numpy as np 

from scipy.interpolate import RectBivariateSpline 

import matplotlib.pyplot as plt 

from mpl_toolkits.mplot3d import Axes3D 

# Regularly spaced, coarse grid 
dx, dy = 0.4, 0.4 

xmax, ymax = 2, 4 
x = np.arange(-xmax, xmax, dx) 
y = np.arange(-ymax, ymax, dy) 

X, Y = np.meshgrid(x, y) 

Z = np.exp(-(2*X)**2 - (Y/2)**2) 

O interp_spline = RectBivariateSpline(y, x, Z) 

# Regularly spaced, fine grid 
dx2, dy2 = 0.16, 0.16 

x2 = np.arange(-xmax, xmax, dx2) 
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Figure 8.15 Two-dimensional interpolation from a coarse rectangular grid (left-hand plot) to a 
finer one (right-hand plot) with scipy. interpolate . RectBivariateSpline. 


y2 = np.arange(-ymax, ymax, dy2) 

X2, Y2 = np.meshgrid(x2,y2) 

Z2 = interp_spline(y2, x2) 

fig, ax = plt.subplots(nrows=l, ncols=2, subplot_kw={'projection': '3d'}) 
ax[0].plot_wireframe(X, Y, Z, color='k') 

ax[1].plot_wireframe(X2, Y2, Z2, color='k') 
for axes in ax: 

axes.set_zlim(-0.2,1) 
axes.set_axis_off() 

fig.tight_layout() 
plt.show() 


O Note that for our function, z, defined using the meshgrid set up here, the 
RectBivariateSpline method expects the corresponding one-dimensional arrays y 
and x to be passed in this order (opposite to that of interp2d). 9 


Interpolation of unstructured data 

To interpolate unstructured data (that is, data points provided at arbitrary coordinates 
(x, y)) onto a grid, the method scipy. interpolate . griddata can be used. Its basic 
usage for two dimensions is: 

scipy.interpolate.griddata(points, values, xi, method='linear') 

where the provided data are given as the one-dimensional array, values, at the 
coordinates points, which is provided as a tuple of arrays x and y or as a single array of 
shape (n, 2 ) where n is the length of the values array. xi is an array of the coordinate 
grid to by interpolated onto (of shape (m, 2) .) The methods available are ' linear' 
(the default), 'nearest' and 'cubic'. 


9 This issue is related to the way that meshgrid is indexed, which is based on the conventions of MATLAB. 
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Figure 8.16 Some different interpolation schemes for scipy. interpolate . griddata. 


Example E8.20 The code mentioned here illustrates the different kinds of interpola¬ 
tion method available for scipy. interpolate. griddata using 400 points chosen 
randomly from an interesting function. The results can be compared in Figure 8.16. 

Listing 8.16 Interpolation from an unstructured array of two-dimensional points with 

scipy.interpolate.griddata 

# eg8-grid.interp.py 
import numpy as np 

from scipy.interpolate import griddata 
import matplotlib.pyplot as plt 

x = np.linspace(-1,1,100) 
y = np.linspace(-1,1,100) 

X, Y = np.meshgrid(x,y) 

def f(x, y): 

s = np.hypot(x, y) 

phi = np.arctan2(y, x) 

tau = s + s*(l-s)/5 * np.sin(6*phi) 

return 5*(l-tau) + tau 

T = f (X, Y) 

# Choose npts random point from the discrete domain of our model function 
npts = 400 

px, py = np.random.choice(x, npts), np.random.choice(y, npts) 
fig, ax = plt.subplots(nrows=2, ncols=2) 

# Piot the model function and the randomly selected sample points 
ax[0,0].contourf(X, Y, T) 

ax[0,0].scatter(px, py, c='k', alpha=0.2, marker='.') 
ax[0,0].set_title('Sample points on f(X,Y)') 
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# Interpolate using three different methods and piot 
for i, method in enumerate(('nearest', 'linear', 'cubic')): 
Ti = griddata((px, py), f(px,py), (X, Y), method=method) 
r, c = (i + 1) // 2, (i+1) % 2 
ax[r,c].contourf(X, Y, Ti) 

ax[r, c] .set_title('method = {}'.format(method)) 
plt.show() 


8.4 Optimization, data-fitting and root-finding 


The scipy.optimize package provides a range of popular algorithms for mini- 
mization of multidimensional functions (with or without additional constraints), least- 
squares data-fitting and multidimensional equation solving (root-finding). This section 
will give an overview of the more important options available, but it should be bome 
in mind that the best choice of algorithm will depend on the individual function being 
analyzed. For an arbitrary function, there is no guarantee that a particular method will 
converge on the desired minimum (or root, etc.), or that if it does so it will converge 
quickly. Some algorithms are better suited to certain functions than others, and the more 
you know about your function the better. SciPy can be configured to issue a waming 
message when a particular algorithm fails, and this message can usually help to analyze 
the problem. 

Furthermore, the resuit returned often depends on the initial guess provided to the 
algorithm - consider a two-dimensional function as a landscape with several valleys 
separated by steep ridges: an initial guess placed within one valley is likely to lead most 
algorithms to wander downhill and find the minimum in that valley (even if it isn’t the 
global minimum) without climbing the ridges. Similarly, you might expect (but cannot 
guarantee ) that most numerical root-finders return the “nearest” root to the initial guess. 


8.4.1 Minimization 


SciPy’s optimization routines minimize a function of one or more variables, f{x\,X 2 , 
■ ■ ■ ,x n ). To find the maximum, one determines the minimum of —f(xi,X 2 , ■ ■ ■ ,x n ). 

Some of the minimization algorithms only require the function itself to be evaluated; 
others require its first derivative with respect to each of the variables in an array known 
as the Jacobian : 



Some algorithms will attempt to estimate the Jacobian numerically if it cannot be pro¬ 
vided as a separate function. 

Furthermore, some sophisticated optimization algorithms require information about 
the second derivatives of the function, a symmetric matrix of values called the Hessian: 
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H(f) 


3 2 / 

3 2 / 

3 2 / 

dxf 

8 2 f 

8 x 28 x 1 

3x„3x 1 

3 2 / 

3 2 / 

3xi d.X2 

3xi; 

8x n 8x2 


8 2 f 

3 2 / 

3 2 f 

3xi3x„ 

3x2 3x„ 

3x2 


Just as the Jacobian represents local gradient of a function of several variables, the 
Hessian represents the local cumciture. 


Unconstrained minimization 

The general algorithm for the minimization of multivariate scalar functions is 
scipy. optimize .minimize, which takes two mandatory arguments: 

minimize(fun, xO, ...) 

The first is a function object, fun, for evaluating the function to be minimized: this 
function should take an array of values, x, defining the point at which it is to be evaluated 
(xi, X 2 , ■ • ■ ,x n ) followed by any further arguments it requires. The second required 
argument, xO, is an array of values representing the initial guess for the minimization 
algorithm to start at. 

In this section we will demonstrate the use of minimize with Himmelblau’s function, 
a simple two-dimensional function with some awkward features that make it a good 
test-function for optimization algorithms. Himmelblau’s function is 

/(x, y) = (x 2 + y - 11 ) 2 + (x + y 2 - 7) 2 . 

The region — 5 < x < 5, — 5 < y < 5 contains one local maximum, 

/(-0.270845,-0.923039) = 181.617 

(though the function climbs steeply outside of this region). There are four minima: 

/(3,2) = 0, 

/(-2.805118,3.131312) = 0, 

/(-3.779310, -3.283186) = 0, 

/(3.584428,-1.848126) = 0. 

and four saddle points. Figure 8.17 shows a contour plot of the function. 

The function may be defined in Python in the usual way: 

In [x]: def f(X): 

. . . : x, y = X 

return (x**2 + y - 11)**2 + (x + y**2 - 7)**2 

where for clarity we have unpacked the array, x, holding (xi,X 2 ) into the named values 
xi = x and X 2 = y. 

To find a minimum, call minimize with some initial guess, say (x, y) = (0,0): 

In [x]: from scipy.optimize import minimize 
In [x]: minimize(f, (0,0)) 
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Table 8.4 Minimization information dictonary returned by scipy. optimize .minimize 


Key 


Description 


success 

x 


f un 

message 
j ac 

hess, hess_inv 
nfev, njev, nhev 


A boolean value indicating whether or not the minimization was 
successful 

If successful, the solution: the values of (x\,xi , ■ ■ ■ ,x n ) at which 
the function is a minimum. If the algorithm was not successful, x 
indicates the point at which it gave up 

If successful, the value of the function at the minimum identihed as x 

A string describing of the outcome of the minimization 

The value of the Jacobian: if the minimization is successful the values 

in this array should be close to zero 

The Hessian and its inverse (if used) 

The number of evaluations of the function, its Jacobian and its 
Hessian 



Figure 8.17 Contour plot of Himmelblau’s function. 


jac: array([ - 8.7778 0211e-06, -3.52519449e-06] ) 

message: 'Optimization terminated successfully.' 
fun: 6.15694370233122e-13 
njev: 16 

hess_inv: array([[ 0.01575433, -0.00956965], 

[-0.00956965, 0.03491686]]) 

status: 0 
nfev: 64 
success: True 

x: array([ 2.99999989, 1.99999996]) 

minimize returns a dictionary-like object with information about the mini mi zation. 
The important fields are described in Table 8.4: if the minimization is successful, the 
minimum appears as x in this object - here we have converged close to the minimum 
/ 0 , 2 ) = 0 . 
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Table 8.5 Some of the minimization methods used by scipy. optimize .minimize 


method Descriptiori 


BFGS 

Nelder-Mead 

CG 

Powell 

dogleg 

TNC 

1-bfgs-b 

slsqp 

cobyla 


Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, the default for mini¬ 
mization without constraints or bounds 

Nelder-Mead algorithm, also known as the downhill simplex or amoeba 
method. No derivatives are needed 
Conjugate gradient method 

PowelFs method (no derivatives are needed with this algorithm) 

Dog-leg trust-region algorithm (unconstrained minimization). Requires the 
Jacobian and the Flessian (which must be positive-definite) 

Truncated Newton algorithm for minimization within bounds 
Bound-constrained minimization with the L-BFGS-B algorithm 
“Sequential least squares programming” method for minimization with 
bounds and equality and inequality constraints 

“Constrained optimization by linear approximation” method for constrained 
minimization 


The algorithm to be used by minimize is specified by setting its method argument 
to one of the strings given in Table 8.5. The default algorithm, BFGS, is a good general- 
purpose quasi-Newton method that can approximate the Jacobian if it is not provided 
and does not use the Hessian. However, it struggles to find the maximum of Himmel- 
blau’s function: 

In [x]: mf = lambda X: -f(X) # to find the maximum, minimize -f(x,y) 

In [x]: minimize(mf, (0,0)) 

Out [x] : 

jac: array([ 1.17853903e+13, 4.57328118e+13]) 

message: 'Desired error not necessarily achieved due to precision loss.' 
fun: -2.9978221235736595e+17 
njev: 16 

hess_inv: array([[ 1.03696455, -0.26722678], 

[-0.26722678, 0.0688646 ]]) 

Status: 2 
nfev: 76 
success: False 

x: array( [-14336., -22528.]) 

Starting at (0,0), the BFGS algorithm has wandered up one of the steep sides of the 
Himmelblau function (note the size of the Jacobian) and failed to converge. In fact, we 
need to start quite close to the maximum to succeed: 

In [x] : minimize(mf, (-0.2,-1)) 

Out [x] : 

jac: array([ 3.81469727e-06, 1.90734863e-06]) 

message: 'Optimization terminated successfully.' 
fun: -181.61652152258262 
nj ev: 8 

hess_inv: array([[ 0.0232834 , -0.00626945], 

[-0.00626945, 0.06137267]]) 

Status: 0 
nfev: 32 
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success: True 

x: array([-0.27084453, -0.92303852]) 

This is, of course, not much help if we don’t know in advance where the maximum is! 
Let’s try a different minimization algorithm, starting at our arbitrary guess, (0,0): 

In [x]: minimize(mf, (0,0), method='nelder-mead') 

Out[x]: 

status: 0 
nfev: 115 
success: True 

message: 'Optimization terminated successfully.' 
fun: -181.61652150549165 
nit: 59 

x: array([-0.27086815, -0.92300745]) 


The Nelder-Mead algorithm is a simplex method that does not need or estimate the 
derivatives of the function, so it isn’t tempted up the steep sides of the function. How- 
ever, it has taken 115 function evaluations to converge on the local maximum. 

As a final example, consider the dogleg method, which requires minimize to be 
passed functions evaluating the Jacobian and the Hessian. The necessary derivatives 
have simple analytical forms for Himmelblau’s function: 


8 2 / 

dydx 


± = 4x(x 2 +y 

dx 



+ y 


- ii) + 2(x + r 
11 ) +4 y(x + y 2 


9V 
dx 2 


12x 2 +4y — 42 


d 2 / 

dy 2 

d 2 f 

3x3 y 


12y 2 +4x-26 


4x + 4 y 


7) 

7) 


The Jacobian and Hessian can be coded up as follows: 


In [x] 


def df: 

x, y = X 

fl, f2 = x**2 + y - 11, x + y**2 

dfdx = 4*x*fl + 2*f2 

dfdy = 2*fl + 4*y*f2 

return np.array([dfdx, dfdy]) 


7 


In 


[x] 


def ddf: 

x, y = X 

d2fdx2 = 12*x**2 + 4*y - 42 
d2fdy2 = 12*y**2 + 4*x - 26 
d2fdxdy = 4*(x + y) 

return np.array([[d2fdx2, d2fdxdy], [d2fdxdy, d2fdy2]]) 


O In [x] 
In [x] 


mdf = lambda X: -df(X) 
mddf = lambda X: -ddf(X) 
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O Note that as with the function itself, we need to use the negative of the Jacobian and 
Hessian if we seek the maximum: these are defined as lambda functions mdf and mddf. 


In [x]: minimize(mf, (0,0), jac=mdf, hess=mddf, method='dogleg') 
Out [x] : 

jac: array([ -1.26922473e-10, 1.23685240e-09]) 

message: 'Optimization terminated successfully.' 
fun: -181.6165215225827 
hess: array([[ 44.81187272, 4.77553259], 

[ 4.77553259, 16.85937624]]) 

nit: 4 
nj ev: 5 

x: array([-0.27084459, -0.92303856]) 
status: 0 
nfev: 5 
success: True 
nhev: 4 


The algorithm has converged successfully on the local maximum in five function eval- 
uations, five Jacobian evaluations and four Hessian evaluations. 


0 Constrained optimization 

Sometimes it is necessary to find the maximum or minimum of a function subject to one 
or more constraints. To use the earlier mentioned function as an example, you may wish 
for the single mi nimum of f(x,y) that satisfies x > 0 , y > 0 ; or the minimum value of 
the function along the line x = y. 

The algorithms 1-bfgs-b, tnc and slsqp support the bounds argument to 
minimize. bounds is a sequence of tuples, each giving the (min, max) pairs for 
each variable of the function defining the bounds on that variable to the mi nimization. 
If there is no bound in either direction, use None. 

For example, if we try to find a minimum in f(x, y ) starting at (— j, — without spec- 
ifying any bounds, the slsqp method converges (just about) on the one at (—2.805118, 
3.131312); 

In [x]: minimizelf, (-0.5,-0.5), method='slsqp') 

Out [x] : 

jac: array( [-0.00721077, 0.00037714, 0. ]) 

message: 'Optimization terminated successfully.' 
fun: 4.0198760213901536e-07 
nit: 10 
njev: 10 

x: array( [-2.80522924, 3.131319 ]) 

status: 0 
nfev: 46 
success: True 

To stay in the quadrant x < 0, y < 0, set bounds with no minimum on x or y and a 
maximum bound at x = 0 and y = 0: 

In [x]: xbounds = (None, 0) 

In [x]: ybounds = (None, 0) 

In [x]: bounds = (xbounds, ybounds) 

In [x]: minimize(f, (-0.5,-0.5), bounds=bounds, method='slsqp') 
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Out[x]: 


jac : 
message: 
fun: 
nit: 
nj ev: 
x: 

status: 
nf ev: 


array( [-0.00283595, -0.00034243, 0. 

'Optimization terminated successfully.' 
4.115667606325133e-08 
11 
11 


array([-3.77933774, -3.28319868]) 
0 

50 


success: True 


] ) 


Suppose we wish to find the extrema of Himmelblau’s function that also satisfy 
the condition x = y (that is, they lie along the diagonal of Figure 8.17). Two of the 
minimization methods listed in Table 8.5 allow for constraints, cobyla and slsqp, so 
we must use one of these. 

Constraints are specified as the argument constraints to the minimize function 
as a sequence of dictionaries defining string keys ' type ': the type of constraint and 
' fun' : a callable object implementing the constraint. ' type ' may be ' eq' or ' ineq' 
for a constraint based on an equality (such as x = y ) or an inequality (e.g., x > 2y - 1). 
Note that cobyla does not support equality constraints. 

An equality constraint function should return zero if the constraint function is met; 
an inequality constraint function should return a non-negative value if the inequality is 
met. 

To find the minima in f(x,y) subject to the constraint x = y, we can use the slsqp 
method with an equality constraint function returning x — y: 

In [x] : con = {' type' : ' eq' , ' fun' : lambda X: X[0] - X[l]} 

In [x]: minimize(f, (0,0), constraints=con, method='slsqp') 
jac: array( [-16.33084416, 16.33130538, 0. ]) 

message: 'Optimization terminated successfully.' 
fun: 8.0000000007160867 
nit: 7 
nj ev: 7 

x: array([ 2.54138438, 2.54138438]) 

status: 0 
nfev: 32 
success: True 


The method converged on one of the minima (there is another: start at, for e.g., (—2, —2) 
to find it). What about the maximum? 


In [x]: minimize(mf, (0,0), constraints=con, method='slsqp') 
Out[x]: 


jac 
message 
fun 
nit 
n j ev 
x 

status 

nfev 

success 


array([ 0., 0., 0.]) 

'Singular matrix C in LSQ subproblem' 

-3.1826053300603689e+68 

4 

4 

array([ -1.12315113e+17, -1.12315113e+17]) 

6 

16 

False 
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That didn’t go so well - the algorithm wandered up the side of a valley. A better choice 
of algorithm here is cobyla, but this method doesn’t support equality constraints, so 
we will build one from a pair of inequalities: x = y if both of x > y and x < y are not 
satisified: 

In [x]: coni = {'type': 'ineq', ' fun': lambda X: X[0] - X[l]} 

In [x]: con2 = {'type': 'ineq', 'fun': lambda X: X[l] - X[0]} 

In [x]: minimize(mf, (0,0), constraints=(coni, con2), method='cobyla') 

Out[x]: 

status: 1 
nfev: 34 
success: True 

message: 'Optimization terminated successfully.' 
fun: -179.12499987327624 
maxcv: 0.0 

x: array([-0.49994148, -0.49994148]) 

Here, the constraint function defined in coni returns a non-negative value if x > y and 
that defined in con 2 retums a non-negative value if x < y. The only way both can be 
satisfied is if x = y. 

Minimizing a function of one variable 

If the function to be minimized is univariate (i.e., takes only one variable, a scalar), 
a faster algorithm is provided by scipy. optimize .minimize_scalar. To simply 
return a minimum, this function can be called with method=' brent', which imple - 
ments Brent’s method for locating a minimum. 

Ideally, one should “bracket” the minimum first by providing values for x, (a, b, c ) 
such that/(a) > f(b) and/(c) > f(b). This can be done with the bracket argument 
which takes the tuple (a, b, c). If this isn’t possible or feasible, provide an interval of 
two values of x on which to start a search for such a bracket (in the downhill direction). 
If no bracket argument is specified, this search is initiated from the interval (0,1). 
Figure 8.18 gives an example polynomial with two minima and a maximum. 

With no bracket, minimize_scalar converges on the minimum at —2.841 for this 
function: 


In [x] 

Polynomial = np.polynomial 

Polynomial 

In [x] 

from scipy.optimize 

import 

minimize scalar 

In [x] 

f = Polynomial( (48 

, 28 . , 

-24., -3 . , 1. ) ) 

In [x] 

minimize scalar(f) 



Out [x] 




fun: 

-91.32163915433344 



nfev: 

11 



X: 

-2.8410443265958261 



nit: 

10 




If we bracket the other minimum by providing values (a, b, c) = (3 , 4, 6) which can 
be seen from Figure 8.18 to satisfy f(a) > f(b) < f(c), the algorithm converges on 
4.549: 

In [x]: minimize_scalar(f, bracket=(3,4,6)) 

Out [x] : 

fun: -175.45563549487974 
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X 


Figure 8.18 The polynomial f(x) = X 4 - 3x 3 - 24x~ + 28x + 48. 


nfev: 11 

x: 4.5494683642571934 
nit: 10 

Finally, to find the maximum, call minimize_scalar with —/(x). This time we will 
initialize a search for a bracket to the minimum of —/(x) with the pair of values (—1,0): 

In [x]: minimize_scalar(-f, bracket=(-l, 0)) 


Out[x] 


fun: 

-55.7343 058 99213226 

nfev: 

9 

x: 

0.54157595897344157 

nit: 

8 


Example E8.21 A simple model for the envelope of an airship treats it as the volume 
of revolution obtained from a pair of quarter-ellipses joined at their (equal) semi-minor 
axes. The semi-major axis of the aft ellipse is taken to be longer than that representing 
the bow by a factor a = 6. Equations describing the cross section (in the vertical plane) 
of the airship envelope may be written 

Vx(2 a — x) (x < a), 

2 ija 2 - (a < x < a (a + 1)). 

The drag on the envelope is given by the formula 

D = ipairV 2 V 2/3 C DV , 
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where p- MX is the air density, v the speed of the airship, V the envelope volume and the 
drag coefficient, Cdv is estimated using the following empirical formula: 10 

C D v = Re“ 1/6 [0.172(//^) 1/3 + 0.252(d/l) L2 + 1.032 (d/l) 2J ]. 

Here, Re = p a irv///x is the Reynold’s number and /x the dynamic viscosity of the air. / 
and d are the airship length and maximum diameter (= 2 b) respectively. 

Suppose we want to minimize the drag with respect to the parameters a and b but lix 
the total volume of the airship envelope, V = ^nab 2 {\ + a). The following program 
does this using the slsqp algorithm, for a volume of 200000 m 3 , that of the Hinden- 
burg. 

Listing 8.17 Minimizing the drag on an airship envelope 


# egd-airship.py 
import numpy as np 

from scipy.optimize import minimize 

# air density (kg.m-3) and dynamic viscosity (Pa.s) at cruise altitude 
rho, mu = 1.1, 1.5e-5 

# air speed (m.s-1) at cruise altitude 
v = 30 

def CDV(L, d): 

nnn calculate the drag coefficient. """ 

Re = rho * v * L / mu # Reynold's number 

r = L / d # "Fineness " ratio 

return (0.172 * r**(l/3) + 0.252 / r**1.2 + 1.032 / r**2.7) / Re**(1/6) 
def D(X): 

""" Return the total drag on the airship envelope . """ 

a, b = X 

L = a * (1+alpha) 

return 0.5 * rho * v**2 * V(X)**(2/3) * CDV(L, 2*b) 

# Fixed total volume of the airship envelope (m3) 

V0 = 2.e5 

# Parameter describing the tapering of the stern of the envelope 
alpha - 6 

def V(X) : 

""" Return the volume of the envelope. """ 
a, b = X 

return 2 * np.pi * a * b**2 * (1+alpha) / 3 

# Minimize the drag, constraining the volume to be equal to V0 

aO, bO = 70, 45 # initial guesses for a, b 

con = {'type': 'eq', 'fun': lambda X: V(X)-V0} 
res = minimize(D, (aO, bO), method='slsqp', constraints=con) 
if res ['success'] : 
a, b = res['x'] 

L, d = a * (1+alpha), 2*b # length, greatest diameter 


10 S. F. Hoerner, Fluid Dynamic Drag, Hoemer Fluid Dynamics (1965). 
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print ('Optimum parameters: a = { :g} m, b = {:g} m' .format(a, b) ) 
print ('V = {:g} m3'.format(V(res['x']))) 
print('Drag, D = {:g} N'.format(res['fun'])) 
print('Total length, L = {:g} m'.format(L)) 
print('Greatest diameter, d = {:g} m'.format(d)) 
print('Fineness ratio, L/d = {:g}'.format(L/d)) 
else: 

# We failed to converge: output the resuits dictionary 
print('Failed to minimize D!', res, sep='\n') 


This example is a little contrived, since for fixed a the requirement that V be constant 
means that a and b are not independent, but a solution is found readily enough: 

Optimum parameters: a = 32.9301 m, b = 20.3536 m 

V = 200000 m3 

Drag, D = 20837.6 N 

Total length, L = 230.51 m 

Greatest diameter, d = 40.7071 m 

Fineness ratio, L/d = 5.66266 

The actual dimensions of the Hindenburg were / = 245 m,d = 41 m giving the ratio 
l/d = 5.98; so we didn’t do too badly. 


8.4.2 


Nonlinear least squares fitting 


SciPy’s general nonlinear least squares fitting routine is scipy. optimize. leastsq, 
which has the most basic call signature: 

scipy. optimize . leastsq (fune, xO, args=0). 

This will attempt to fit a sequence of data points, y, to a model function, f , which 
depends on one or more fit parameters. leastsq is passed a related function object, 
fune, which retums the difference between y and f (the residuals). leastsq also 
requires an initial guess for the fitted parameters, x0. If fune requires any other argu- 
ments (typically, arrays of the data, y, and one or more independent variables), pass 
them in the sequence args. For example, consider fitting the artificial noisy decaying 
cosine function, /(f) = Ae^ z coslirvt (Figure 8.19). 

In [x]: import numpy as np 
In [x]: import pylab 


In [x]: A, freq, tau = 10, 4, 0.5 
In [x]: def f(t, A, freq, tau): 

...: return A * np.exp(-t/tau) * np.cos(2*np.pi * freq * t) 


In [x] 
In [x] 
In [x] 
In [x] 
In [x] 
In [x] 
In [x] 


tmax, dt = 1, 0.01 
t = np.arange(0, tmax, dt) 
yexact = f(t, A, freq, tau) 

y = yexact + np.random.randn(len(yexact))*2 
pylab.plot(t, yexact) 
pylab.plot(t, y) 
pylab.show() 


To fit this noisy data, y, to the parameters A, freq and tau (pretending we don’t 
know them), we first define our residuals function: 
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Figure 8.19 A synthetic noisy decaying cosine function. 


In [x]: def residuals(p, y, t): 

...: A, freq, tau = p 

...: return y - f(t, A, freq, tau) 


The first argument is the sequence of parameters, p, which we unpack into named 
variables for clarity. The additional arguments needed are the data itself, y, and the 
independent variable, t. Now make some initial guesses for the parameters that aren’t 
too wildly off and call leastsq: 


In [x]: from scipy.optimize import leastsq 
In [x] : pO = 5, 5, 1 

In [x] : plsq = leastsq(residuals, pO, args=(y, t)) 

In [x] : plsq [0] 

Out[x]: [ 9.33962672 4.04958427 0.48637434] 

As with SciPy’s other optimization routines, leastsq can be contigured to return more 
information about its working, but here we report only the solution (best fit parameters), 
which is always the first item in the plsq tuple. 

The true vaiues are A, freq, tau = 10, 4, o . 5, so given the noise we haven’t 
done badly. Graphically, 

In [x]: pylab.plot(t, y, 'o', c='k', label='Data') 

In [x]: pylab.plot(t,yexact,c='gray', label='Exact') 

In [x]: pylab.plot(t,f(t, *pfit),c='k', label='Fit') 

In [x]: pylab.legend() 

In [x]: pylab.show() 


The fit is illustrated in Figure 8.20. 

If it is known, it is also possible to pass the Jacobian to leastsq, as the following 
example demonstrates. 
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Figure 8.20. 


Example E8.22 In this example, we are given a noisy series of data points that we want 
to fit to an ellipse. The equation for an ellipse may be written as a nonlinear function of 
angle, 9 (0 < 9 <2n), which depends on the parameters a (the semi-major axis) and e 
(the eccentricity): 


r{9\a, e) 


a( 1 — e 2 ) 
1 — e cos 9 


To fit a sequence of data points ( 9,r ) to this function, we first code it as a Python 
function taking two arguments: the independent variable, theta, and a tuple of the 
parameters, p = (a, e). The function we wish to minimize is the difference between 
this model function and the data, r, defined as the method residuals: 


def f (theta, p): 
a, e = p 

return a * (1 - e**2)/(l - e*np.cos(theta)) 


def residuals(p, r, theta): 
return r - f(theta, p) 

We also need to give leastsq an initial guess for the fit parameters, say pO = 
( 1 , 0 . 5 ). The simplest call to fit the function would then pass to leastsq the objects 
residuals, pO and args=(r, theta) (the additional arguments needed by the 
residuals function: 

plsq = leastsq(residuals, pO, args=(r, theta)) 

If at ali possible, however, it is better to also provide the Jacobian (the first derivative 
of the fit function with respect to the parameters to be fitted). Expressions for these are 
straightforward to calculate and implement: 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:54, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.1017/CB09781 1 39871754.008 









8.4 Optimization, data-fitting and root-finding 


393 


90° 



Figure 8.21 Nonlinear least squares fitting of data to the equation of an ellipse in polar 
coordinates. 


df_ = (1 - C 2 ) 

da 1 — e cos, 9 ’ 

9/ a{\ — e 2 ) cos, 9 — 2ae{\ — e cos 9) 

de (1 — e cos 9) 2 

However, the function we wish to minimize is the residuals function, r—f, so we need 
the negatives of these derivatives. Here is the working code and the Iit resuit (Figure 
8 . 21 ). 

Listing 8.18 Nonlinear least squares fit to an ellipse 

# eg8-leastsq.py 

import numpy as np 

from scipy import optimize 

import pylab 

def f (theta, p): 
a, e = p 

return a * (1 - e**2)/(l - e*np.cos(theta)) 


# The data to fit 

theta = np.array( [0.0000,0.44 88,0.8976,1.3464,1.7952,2.2440,2.6928, 

3.1416,3.5904,4.03 92,4.4880,4.93 68,5.3856,5.8344,6.2832] ) 
r = np.array([4.6073, 2.8383, 1.0795, 0.8545, 0.5177, 0.3130, 0.0945, 0.4303, 
0.3165, 0.4654, 0.5159, 0.7807, 1.2683, 2.5384, 4.7271]) 


def residuals(p, r, theta): 

""" Return the observed - calculated residuals using f (theta, p ). 
return r - f(theta, p) 
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def jac(p, r, theta): 

""" Calculate and return the Jacobian of residuals. """ 
a, e = p 

da = (1 - e**2)/(l - e*np.cos(theta)) 

de = (-2*a*e*(l-e*np.cos(theta)) + a*(l-e**2)*np.cos(theta))/(1 - 

e*np.cos(theta))**2 

return -da, -de 

return np.array((-da, -de)).T 

# Initial guesses for a, e 
pO = (1, 0.5) 

plsq = optimize.leastsq(residuals, pO, Dfun=jac, args=(r, theta), col_deriv=True) 
print(plsq) 

pylab.polar(theta, r, 'x') 

theta_grid = np.linspace(0, 2*np.pi, 200) 

pylab.polar(theta_grid, f(theta_grid, plsq[0]), lw=2) 

pylab.show() 


SciPy also includes a curve-fitting function, scipy .optimize. curve fit, that 
can fit data to a function directly (without the need for an additional function to calculate 
the residuals) and supports weighted least squares fitting. The call signature is 

curve_fit(f, xdata, ydata, pO, sigma, absolute_sigma) 

where f is the function to fit to the data (xdata, ydata). pO is the initial guess 
for the parameters, and sigma, if provided, give the weights of the ydata values. 
If absolute sigma is True, these are treated as one Standard deviation error (that 
is, absolute weights); the default, absolute_sigma=False, treats them as relative 
weights. 

The curve fit function returns popt, the best-fit values of the parameters and 
pcov, the covariance matrix of the parameters. 


Example E8.23 To illustrate the use of curve_f it in weighted and unweighted least 
squares fitting, the following program fits the Lorentzian line shape function centered 
at X() with half width at half-maximum (HWHM), y, amplitude, A: 

Ay 2 

fi*) = ~2T~- -y2 ’ 

y + (x - xq) z 

to some artificial noisy data. The fit parameters are A, y and xq. The noise is such that a 
region of the data close to the line center is much noisier than the rest. 

Listing 8.19 Weighted and unweighted least squares fitting with curve_f it 

# eg8- curve-fit. py 
import numpy as np 

from scipy.optimize import curve_fit 
import pylab 

xO, A, gamma =12, 3, 5 
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n = 200 

x = np.linspace(1, 20, n) 

yexact = A * gamma**2 / (gamma**2 + (x-x0)**2) 

# Add some noise with a sigma of 0.5 apart from a particularly noisy region 

# near xO where sigma is 3 
sigma = np.ones(n)*0.5 
sigma[np.abs(x-xO+1)<1] = 3 
noise = np.random.randn(n) * sigma 
y = yexact + noise 

def f(x, xO, A, gamma): 

nnn yhe Lorentzian entered at xO with amplitude A and HWHM gamma. """ 
return A *gamma**2 / (gamma**2 + (x-x0)**2) 

def rms(y, yfit): 

return np.sqrt(np.sum((y-yfit)**2)) 

# Unweighted fit 
pO = 10, 4, 2 

popt, pcov = curve_fit(f, x, y, pO) 
yfit = f(x, *popt) 

print('Unweighted fit parameters:', popt) 
print('Covariance matrix:'); print(pcov) 
print('rms error in fit: 7 , rms(yexact, yfit)) 

print() 

# Weighted fit 

popt2, pcov2 = curve_fit(f, x, y, pO, sigma=sigma, absolute_sigma=True) 
yfit2 = f(x, *popt2) 

print('Weighted fit parameters:', popt2) 
print('Covariance matrix:'); print(pcov2) 
print('rms error in fit:', rms(yexact, yfit2)) 

pylab.plot(x, yexact, label='Exact') 
pylab.plot(x, y, 'o', label='Noisy data') 
pylab.plot(x, yfit, label='Unweighted fit') 
pylab.plot(x, yfit2, label='Weighted fit') 
pylab.ylim(-1,4) 

pylab.legend(loc='lower center') 
pylab.show() 


As Figure 8.22 shows, the unweighted fit is thrown off by the noisy region. Data in 
this region are given a lower weight in the weighted fit and so the parameters are closer 
to their true values and the fit better. The output is 

Unweighted fit parameters: [ 11.61282984 3.64158981 3.93175714] 

Covariance matrix: 

[[ 0.0686249 -0.00063262 0.00231442] 

[-0.00063262 0.06031262 -0.07116127] 

[ 0.00231442 -0.07116127 0.16527925]] 

rms error in fit: 4.10434012348 

Weighted fit parameters: [ 11.90782988 3.0154818 4.7861561 ] 

Covariance matrix: 
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Figure 8.22 Example of least squares fit with scipy. optimize . curve_f it. 


[[ 0.01893474 -0.00333361 0.00639714] 

[-0.00333361 0.01233797 -0.02183039] 

[ 0.00639714 -0.02183039 0.06062533]] 

rms error in fit: 0.694013741786 


8.4.3 Root-finding 

scipy. optimize provides several methods for obtaining the roots of both univariate 
and multivariate functions. We describe here only the algorithms relating to functions 
of a single variable: brentq, brenth, ridder and bisect. Each of these methods 
requires a continuous function,/(x), and a pair of numbers defining a bracketing interval 
for the root to find; that is, values a and b such that the root lies in the interval [a, b\ 
and f(a) = —f(b). Details of the algorithms behind these root-finding methods can be 
found in Standard textbooks on numerical analysis. 11 

In general, the method of choice for finding the root of a well-behaved function 
is scipy.optimize.brentq, which implements a version of Brent’s method with 
inverse quadratic extrapolation (scipy. optimize. brenth is a similar algorithm but 
with hyperbolic extrapolation). As an example, consider the following function for 
-1 < x < 1: 

1 

/(x) = - + xcos 



11 For example, Press et al., (2007). Numerical Recipes, 3rd ed., Cambridge University Press. 
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A plot of this function (Figure 8.23) suggests there is a root between —0.7 and —0.5: 

In [x]: f = lambda x: 0.2 + x*np.cos(3/x) 

In [x]: x = np.linspace(-1, 1, 1000) 

In [x]: pylab.plot(x,f(x)) 

In [x]: pylab.axhline(0, color='k') 

In [x]: pylab.show() 

In [x]: from scipy.optimize import brentq 
In [x]: brentq(f, -0.7, -0.5) 

Out[x]: -0.5933306271014237 

The algorithm for root-finding known as Ridder’s method is implemented in the 
function scipy. optimize . ridder and the slower but very reliable (for continuous 
functions) method of bisection is scipy. optimize. bisect. 

Finally, root-finding by the Newton-Raphson algorithm can be very fast (quadratic) 
for many continuous functions, provided the first derivative,/'(x), can be calculated. 
For functions for which an analytical expression for f (x) can be coded, this is passed to 
the method scipy. optimize. newton as the argument f prime along with a starting 
point, x0, which should (in general) be as near to the root as possible. It is not necessary 
to bracket the root. If the/'(x) cannot be provided, the secant method is used by newton. 
If you are in the happy position of being able to provide the second derivative, /"(x), 
as fprime2 as well as the first, Halley’s method (which converges even faster than the 
basic Newton-Raphson algorithm) is used instead. 

Note that the stopping condition within the iterative algorithm used by newton is the 
step size so there is no guarantee that it has converged on the desired root: the resuit 
should be verified by evaluating the function at the retumed value to check that it is 
(close to) zero. 
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Table 8.6 Population data for voles measured by 
Leslie and Ranson 


x /weeks 

m(x) 

P(x) 

8 

0.6504 

0.83349 

16 

2.3939 

0.73132 

24 

2.9727 

0.58809 

32 

2.4662 

0.43343 

40 

1.7043 

0.29277 

48 

1.0815 

0.18126 

56 

0.6683 

0.10285 

64 

0.4286 

0.05348 

72 

0.3000 

0.02549 


Example E8.24 In ecology, the Euler-Lotka equation describes the growth of a pop¬ 
ulation in terms of P{x), the fraction of individuals alive at age x and m(x), the mean 
number of live females born per time period per female alive during that time period: 

P 

P(x)m(x)e~” = 1, 

x=a 

where a and /i are the boundary ages for reproduction defining the discrete growth rate, 
X = e r . r = In X is known as Lotka’s intrinsic rate ofnatural increase. 

In a paper by Leslie and Ranson, 12 P(x) and m(x) were measured for a population 
of voles ( Micro tus agrestis ) using a time period of eight weeks. The data are given in 
Table 8.6. 

The sum Rq = P(x)m(x) gives the ratio between the total number of female 

births in successive generations; a population grows il' Rq > I and r determines how 
fast this growth is. In order to find r, Leslie and Ranson used an approximate numerical 
method; the code mentioned here determines r by finding the real root of the Lotka- 
Euler equation directly (it can be shown that there is only one). 

Listing 8.20 Solution of the Euler-Lotka equation 

# eg8-euler-lotka .py 

import numpy as np 

from scipy.optimize import brentq 


# The data, from Table 6 of: 

# P. H. Leslie and R. M. Ranson, J. Anim. Ecol. 9, 21 (1940) 
x = np.linspace(8, 72, 9) 

m = np.array( [0.6504, 2.3939, 2.9727, 2.4662, 1.7043, 

1.0815, 0.6683, 0.4286, 0.3000] ) 

P = np.array( [0.83349, 0.73132, 0.58809, 0.43343, 0.29277, 

0.18126, 0.10285, 0.05348, 0.02549] ) 


12 P. H. Leslie and R. M. Ranson, (1940). J. Anim. Ecol. 9, 27. 
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8.4.4 

Questions 


# Calculate the product sequence f and RO, the ratio between the number of 

# female births in successive generations. 
f = P * m 

RO = np.sum(f) 
if RO > 1: 

msg = 'RO > 1: population grows' 

else: 

msg = 'Population does not grow' 

# The Euler-Lotka equation: we seek the one real root in r 
def fune(r): 

return np.sum(f * np.exp(-r * x)) - 1 

# Bracket the root and solve with scipy.optimize.brentq 
a, b = 0, 10 

r = brentq(fune, a, b) 

print('R0 = { : . 3 f} ({})'.format (RO , msg)) 

print ('r = { : . 5f} (lambda = {:.5f})'.format(r, np.exp(r))) 


The output of this program is as follows: 

RO = 5.904 (RO > 1: population grows) 
r = 0.08742 (lambda = 1.09135) 

This value of r may be compared with the approximate value obtained by Leslie and 
Ranson, who comment: 

The required root is 0.087703 which slightly overestimates the value of r, to which the series is 
approaching. This lies between 0.0861 (the third degree approximation) and 0.0877, but nearer 
the latter than the former, the error being probably in the last decimal place. 


Exercises 


Q8.4.1 


Use scipy. optimize .brentq to find the Solutions to the equation 


x + 1 = — 


1 

(x - 3) 3 


Q8.4.2 Using s cipy.optimize.newton to find a root of the following functions 
(with the given starting point, xo) fails. Explain why and find the roots either by modi- 
fying the call to newton or by using a different method. 


a. 


b. 


/(x) = x 3 — 5x, xq = 1 


/(x) = X — 3x + 1, Xo = 1 


c. 

/(x) = 2 — x 5 , xq = 0.01 
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d. 


/(x) = x 4 - (4.29)x 2 - 5.29, x 0 = 0.8 


Q8.4.3 The trajectory of a projectile in the xz-plane launched from the origin at an 
angle 0o with speed vo = 25 ms _1 is 

z = xtan@o- y— - x 2 . 

2 vq cos 0o 

If the projectile passes through the point (5,15), use Brent’s method to determine the 
possible values of 6q. 


Problems 

P8.4.1 A rectangular field with area A = 10,000 m 3 is to be fenced-off beside a 
straight river (the boundary with the river does not need to be fenced). What dimensions 
a, b minimize the amount of fencing required? Verify that a constrained minimization 
algorithm gives the same answer as the algebraic analysis. 

P8.4.2 Find ali of the roots of 


1 

/(x) = - + .XCOS 

using (a) scipy. optimize .brentq and (b) scipy. optimize . newton. 

P8.4.3 The Wien displacement law predicts that the wavelength of maximum emission 
from a black body described by Planck’s law is proportional to 1/72 

kma \T = b, 

where b is a constant known as Wien ’s displacement constant. Given the Planck distri- 
bution of emitted energy density as a function of wavelength, 

/i t> 8 7T 2 hc 1 

u (X,T) = e hc/Xk B T _ i ’ 

determine the constant b by using scipy. optimize .minimize_scalar to find the 
maximum in u(a, T) for temperatures in the range 500 K < T < 6000 K and fitting 
2-max to a straight line against 1/72 Compare with the “exact” value of b, which is 
available with in scipy. constants (see Section 8.1.1). 



P8.4.4 Consider a one-dimensional quantum mechanical particle in a box (—1 < x < 1) 
described by the Schrodinger equation: 


d 2 i jr 
dx 2 


= Ef, 
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in energy units for which /r/(2m) = 1 with m the mass of the particle. The exact 
solution for the ground state of this system is given by 


x/s = cos 



E 


7r 


2 


4 ‘ 


An approximate solution may be arrived at using the variational principle by minimiz- 
ing the expectation value of the energy of a trial wavefunction, 


N 

Vhrial = 

n =0 


with respect to the coefficients a n . Taking the basis functions to have the following 
symmetrized polynomial form, 

<$> n = (1 - x) N ~ n+l (x + l) n+1 , 


use scipy. optimize . minimize and scipy. integrate. quad to find the optimum 
value of the expectation value (Rayleigh-Ritz ratio): 

£ _ (Vhrial \H\ Striat) _ /-1 Striat ^2 Variat 

{Striat I Vhrial) ^ Striat Striat djc 

Compare the estimated energy, £, with the exact answer for N = 1,2,3,4. ( Hint: use 
np. polynomial. Polynomial objects to represent the basis and trial wavefunctions.) 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:54, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://doi.Org/1 0.1017/CB09781 1 39871754.008 




9 General scientific programming 


9.1 Floating point arithmetic 


9.1.1 


The representation of real numbers 


The real numbers, such as 1.2, —0.36, jt, 4 and 13256.625 may be thought of as points 
on a continuous, infinite number line. 1 Some real numbers (including the integers them- 
selves) can be expressed as a ratio of two integers, for example, ^ and I. Such numbers 
are called rational. Others, such as jt , e and \/2 cannot and are called irrational. 

There can therefore be several ways of writing a real number, depending on which 
category it falis into, and not ali of these ways can express the number precisely (using 
a finite amount of ink!). For example, the rational real number | may be written exactly 
as a decimal expansion as 0.625: 

5 _ 6 2 5 

8 “ 10 + Too + 1000’ 

but the number ^ cannot be written in a finite number of terms of a decimal expansion: 


1 

3 


3 3 

10 + 100 + 


3 

Tooo 


+ ••• = 0.333-■■ 


In writing 5 as a decimal expansion we must truncate the infinite sequence of 3s some- 
where. 

The irrational numbers can be described exactly (given some presumed geometrical 
or other knowledge), for example, jt is the ratio of a circle’s circumference to its 
diameter, \/2 is the length of the hypotenuse of a right-angled triangle whose other sides 
have length 1. To represent or store such a number numerically, however, some level of 
approximation is necessary. For example, yy| is a famous rational approximation to jt. 
A (better) decimal approximation is 3.14159265358979. But, as a decimal expansion , 2 
an infinite number of digits are needed to express the value of jt precisely, just as an 
infinite number of 3s are needed in the decimal expansion of i. 

Computers store numbers in binary, and the same considerations that apply to the 
limits of the decimal representation of a real number apply to its binary representation. 


1 

2 


Obviously, an integer such as 4 is just a special sort of real number. 


Note that a decimal expansion is simply a rational number with a power of 10 in the denominator, 


3.14159265358979 


_ 3I4I59265358979 
“ 100000000000000 ' 
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For example, | has an exact binary representation in a finite number of bits: 


but 


5 10 1 

- = - + - + -= 0 . 101 2 

8 2 4 8 


— = 0.000110011001100110011 ••■ 9 , 

10 

an infinitely repeating sequence. Only a finite number of these digits can be stored, and 
the truncated series of bits converted back to decimal is 

— « 0.10000000000000009 
10 

using the so-called double-precision Standard common to most computer languages on 
most operating Systems. This is the nearest representable number to 

The format of the double-precision floating point representation of numbers is dic- 
tated by the IEEE-754 Standard. There are three parts to the representation, stored in a 
total of 64 bits (8 bytes): the single sign bit, an 11-bit exponent and a 52-bit significand 
(also called the mantissa). This is best demonstrated by an example in decimal: the 
number 13256.625 can be written in scientific notation as: 


13256.625 = +1.3256625 x 10 3 4 


and stored with the sign bit corresponding to +, a significand equal to 13256625 (where 
the decimal point is implicitly to be placed after the first digit) and the exponent 4 . This 
notation is called “floating point” because the decimal point 3 is moved by the number 
of places indicated by the exponent. 

The floating point representation of numbers in binary works in the same way, except 
that each digit can only be 0 or 1, of course. This allows for a neat trick: when the 
number’s binary point (equivalent to the decimal point in base- 10 ) is shifted so that 
its significand has no leading zeros, then it will start with 1. Because all significands 
normalized in this way will start with 1, there is no need to store it, and effectively 53 
bits of precision can be stored in a 52-bit significand . 4 The omitted bit is sometimes 
called the hidden bit. 

In our example, 13256.625 can in fact be represented exactly in binary as 
13256.625io = IlOOllllOOlOOO.lOb 

The normalized form of the significand is therefore liooilliooioooioi and the 
exponent is 13, since: 

1100111 IOOIOOO.IOI 2 = 1.1001111001000101 x 2 13 . 

Now, as discussed, the first digit of the normalized signifcand will always be 1, so it is 
omitted and the significand is stored as 

1001111001000101000000000000000000000000000000000000 


3 More generally known as the radix point in bases other than base-10. 

4 Note that this trick works only in the binary sysetm. 
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In order to allow for negative exponents (numbers with magnitudes less than 1), the 
exponent is stored with a bias: 1023 is added to the actual exponent. That is, actual 
exponents in the range —1022 to +1023 are stored as numbers in the range 1 to 2046. 
In this case, the 11-bit exponent field is 13 + 1023 = 1036: 

10000001100 

Finally, the sign bit is o, indicating a positive number. The full, 64-bit floating point 
representation of 13256.625 (with spaces for clarity) is 

0 10000001100 1001111001000101000000000000000000000000000000000000 

and is exact. However, 0.1 is 

0 01111111011 1001100110011001100110011001100110011001100110011010 

and is not exact (note the truncation and rounding of the infinitely repeating sequence 
0011 ...) - in decimal, this number is 

0.100000000000000005551115123126 

In general, the 53 bits (including the hidden bit) of the significand give about 15 
decimal digits of precision (log 10 (2 53 ) = 15.95). Any calculation resulting in more 
significant digits is subject to rounding error. The upper bound of the relative error due 
to rounding is called the machine epsilon, e. In Python, 


In 

[x] 

import sys 

In 

[x] 

eps = sys.float_info.epsilon 

In 

[x] 

eps 

Out [x] 

2.220446049250313e-16 


It can be shown that the maximum spacing between two normalized floating point 
numbers is 2e. That is, x + 2*eps == x is guaranteed always to be False. 

9.1.2 Comparing floating point numbers 

Because of the finite precision of the floating point representation of (most) real num¬ 
bers it is extremely risky to compare two f loats for equality. For example, consider 
squaring 0. 1 : 

In [x] : (0.1) **2 

Out[x]: 0.010000000000000002 

As we have come to expect, this is not exactly 0.01, but it is also not even the nearest rep- 
resentable number to 0.01, since the number squared was, in fact, 0.10000000000000009. 
The unfortunate consequence of this is 

In [x] : (0.1) **2 == 0.01 

Out[x]: False 

NumPy provides the methods i scios e and allclose (see Section 6.1.12) for com¬ 
paring two floating point numbers or arrays to within a specified or default tolerance: 

In [x]: np.isclose(0.1**2, 0.01) 

Out[x]: True 
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Note also that floating point addition is not necessarily cissocicitive: 

In [x]: a, b, c = lel4, 25.44, 0.74 

In [x] : (a + b) + c 

Out [x] : 100000000000026.17 

In [x] : a + (b + c) 

Out[x]: 100000000000026.19 

Nor, in general, is floating point multiplication distributive over addition: 

In [x]: a, b, c = 100, 0.1, 0.2 

In [x] : a*b + a*c 
Out [x] : 30.0 

In [x] : a * (b + c) 

Out[x]: 30.000000000000004 


9.1.3 Loss of significance 

Most floating point operations (such as addition and subtraction) resuit in a loss of 
significance. That is, the number of significant digits in the resuit can be smaller than 
in the original numbers (operands) used in the calculation. To illustrate this, consider a 
hypothetical floating point representation working in decimal with a 6-digit significand 
and perform the following calculation, written in its exact form: 

1.2345432 - 1.23451 = 0.0000332 

Our hypothetical system cannot store the first operand to its full precision but can only 
get as close as 1.23454. The floating point subtraction then yields 

1.23454 - 1.23451 = 0.00003. 

The original numbers were accurate in the most significant six digits, but the resuit is 
only accurate in its first significant digit. Note that it isn’t the case that the exact resuit 
cannot be represented in all its digits by our floating point architecture: 0.0000332 = 
3.32 x 10 - ' 1 only has three significant digits, well with in the six available to us. The 
drastic loss of significance occurred because there was only a very small difference 
between the two numbers. This effect is sometimes called catastrophic cancellation and 
should always be a consideration when subtracting two numbers with similar values. 

A similar loss of significance can occur when a small number is subtracted from (or 
added to) a much larger one: 

12345.6 + 0.123456 = 12345.72345 (exactly), 

12345.6 + 0.123456 = 12345.7 (6-digit decimal significand). 

Even though the 15 or so significant digits of a double-precision floating point number 
may seem like sufficient accuracy for a single calculation, be aware that repeatedly car- 
rying out such calculations can increase this rounding error dramatically if the numbers 
involved cannot be represented exactly. For example, consider the following: 
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In [x]: for i in range(10000000): 
....: a += 0.1 


In [x] : a 

Out[x]: 999999.9998389754 


The difference between this approximate value and the exact answer, 1000000, is over 
1.61 x 1(T 4 . 

Python’s math module has a function, fsum, which uses a technique called the 
Shewchuk algorithm to compensate for rounding errors and loss of signilicance. Com¬ 
pare these two implementations of the previous sum using a generator expression: 


In [x] 

sum((0.1 for i 

in 

range(10000000))) 

Out [x] 

999999.9998389754 


In [x] 

math.fsum((0.1 

for 

i in range(10000000))) 

Out [x] 

1000000.0 




9.1.4 Underflow and overflow 

Another consequence of the way that floating point numbers are handled is that there 
is a minimum and maximum magnitude of number that can be stored. For example, 
Bayesian calculations frequently require small probabilities to be multiplied together, 
with each probability a number between 0 and 1. For a large number of such probabil¬ 
ities this product can reach a value that is too small to represent resulting in underflow 
to zero: 

In [x] : P = 1 

In [x]: for i in range(101): 

....: print(P) 

....: P *= 5.e-4 

1 

0.0005 
2.5e-07 
1.25e-10 

6.25000000000000le-14 

1.0097419586828971e-307 
5.0487097934146e-311 
2.5243548965e-314 
1.2621776e-317 
6.31e-321 
5e-324 
0.0 
0.0 

Below this value, Python begins to sacrifice some of the precision and maintains a 
modified representation of the number (a denormal, or subnormal number), a process 
called gradual underflow. Eventually, however, the number underflows its representa¬ 
tion totally and becomes indistinguishable from zero. The minimum number that can be 
represented at full IEEE-754 double precision is 


# denormalization starts 


# underflow 
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In [x]: import sys 

In [x]: sys.float_info.min 

Out[x]: 2.2250738585072014e-308 

There are several possible tactics for dealing with underflow (beyond using higher 
precision numbers such as np. f loatl2 8). In the earlier example, it is common to take 
the sum of the logarithms of the probabilities, which has a much more modest magni- 
tude, instead of taking the product directly. Alternatively, one could start the earlier code 
with p = l. e 10 o and manipulate the resulting numbers on the understanding that they 
are larger than they should be by this constant factor. 

Floating point overflow is the problem at the other end of the number scale: the largest 
double-precision number that can be represented is 

In [x]: sys.float_info.max 
Out[x]: 1.7976931348623157e+308 

In NumPy, numbers that overflow are set to the special values inf or - inf depending 
on sign: 

In [x] : f = 1 

In [x] : for x in range (1,4 0,4) : 

print('exp({}) = {}'.format(x**2, np.exp(x**2))) 

exp(l) = 2.718281828459045 

exp(25) = 72004899337.38588 

exp(81) = 1.5060973145850306e+35 

exp(169) = 2.487524928317743e+73 

exp(289) = 3.2441824460394912e+125 

exp(441) = 3.340923407659982e+191 

exp(625) = 2.7167594696637367e+271 

exp(841) = inf 

exp(1089) - inf 

exp(1369) = inf 

This leads to some curious relations between numbers that are too big to represent: 


In 

[x] : 

a, b 

= l.e500. 

In 

[x] : 

a == 

b 

Out 

[x] : 

True 


In 

[x] : 

a, b 


Out 

[x] : 

(inf 

inf) 


There is another special value, nan (“not-a-number”, NaN), which is retumed by 
some operations involving overflowed numbers: 

In [x] : a / b 
Out[x]: nan 

(NumPy also implements its own values, numpy.nan and numpy.inf, see Section 
6.1.4.) Never check if an object is nan with the == operator: nan is not even equal 
to itself(!) 5 : 


5 


Note that this means that the == operator is not an equivalence relation for floating point numbers as it is 
not reflexive. 
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In [x] : c = a / b 
In [x]: c == c 
Out[x]: False 

Python int objects are not subject to overflow, as Python will automatically allocate 
memory to hold them to full precision (within the limitations of available machine mem- 
ory). However, NumPy integer arrays, which map to the underlying C data structures 
are stored in a fixed number of bytes (see Table 6.2) and may overflow. For example, 

In [x]: a = np.zeros(3, dtype=np.intl6) 

In [x]: a[:] = -30000, 30000, 40000 
In [x] : a 

Out[x] : array( [-30000, 30000, -25536], dtype=intl6) 

In [x]: b = np.zeros(3, dtype=np.uintl6) 

In [x]: b[:] = -30000, 40000, 70000 
In [x] : b 

Out[x]: array([35536, 40000, 4464], dtype=uintl6) 

Signed 16-bit integers have the range —32768 to 32767 (—2 15 to (2 15 — 1)). Due 
to the way they are stored, an attempted assignment to the number 40000 has resulted 
instead in the assignment of 40000 — 2 16 = —25536 to a [ 2 ] above. Similarly, unsigned 
16-bit integers are limited to values in the range 0 to 65535 (0 to (2 16 — 1)). Negative 
numbers cannot be represented at ali and b [o] = —30000 gets converted to —30000 
mod 2 16 = 35536; b [2] = 70000 overflows andends up as 70000 mod 2 16 = 4464. 

9.1.5 Further Reading 

• From the Python documentation: Floating Point Arithmetic: Issues and Limita¬ 
tions, available at http://docs.python.org/tutorial/floatingpoint.html. 

• The article “What Every Computer Scientist Should Know About Floating-Point 
Arithmetic” by David Goldberg ( Computing Surveys, March 1991) has become 
something of a classic and for a rigorous approach to the topic of floating point 
arithmetic is highly recommended. It is available at https://docs.oracle.com/cd/ 
E19957- 01 /806- 35 68 /ncg_goldberg.html. 

• S. Oliveira and D. Stewart, (2006). Writing Scientific Software: A Guide to Good 
Style, Cambridge University Press. 

• N. J. Higham, (2002). Accuracy and Stability of Numerical Algorithms, 2nd ed., 
Society for Industrial and Applied Mathematics. 


9.1.6 Exercises 
Questions 

Q9.1.1 The decimat representation of some real numbers is not unique. For example, 
prove mathematically that 0.9 = 0.9999 ••• = !. 

Q9.1.2 V>- a n(7r) = 0 is mathematically well-defined, so why does the folllowing 

calculation fail with a math domain error? 
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In [x]: math.sqrt(math.tan(math.pi)) 


ValueError Traceback (most recent call last) 

<ipython-input-135-7bfdceeef434> in <module>() 

-> 1 math.sqrt(math.tan(math.pi)) 


Q9.1.3 Fermat’s Last Theorem States that no three positive integers x, y and z can 
satisfy the equation x n + y n — z n =0 for any integer n > 2. Explain this apparent 
counter-example to the theorem: 

In [x]: 844487.**5 + 1288439.**5 - 1318202.**5 
Out [x] : 0.0 

Q9.1.4 The functions/(x) = (1 — cos 2 x)/x 1 and g(x) = sin 2 x/x 2 are mathematically 
indistinguishable, but plotted using Python in the region —0.001 < x < 0.001 show a 
significant difference. Explain the origin of this difference. 

Q9.1.5 How can you establish whether a floating point number is nan or not without 
using math. isnan or numpy. isnan? 

Q9.1.6 Predict and explain the outcome of the following: 

a. lelOOl > lelOOO 

b. Ie350/1 . elOO == le250 

C. Ie250 * l.e-250 == lel50 * l.e-150 

d. Ie350 * l.e-350 == le450 * l.e-450 

e. 1 / le250 == le-250 

f. 1 / le350 == le-350 

g. Ie450/le350 != Ie450 * le-350 

h. Ie250/le375 == le-125 

i. Ie35 / ( lelOOO - lelOOO) == 1 / (lelOOO - lelOOO) 

j. lelOOl > lelOOO or lelOOl < lelOOO 

k. lelOOl > lelOOO or lelOOl <= lelOOO 

Problems 

P9.1.1 Heron ’s formula for the area of a triangle (as used in Example E2.3) 

A = y/s(s — a)(s — b)(s — c ) where 5 = \{a + b + c) 

is inaccurate if one side is very much smaller than the other two (“needle-shaped” 
triangles). Why? Demonstrate that the following reformulation gives a more accurate 
resuit in this case by considering the triangle with sides (10~ 13 ,1,1), which has the 
area 5 x 10 -14 . 6 * * 

A = -y/ (a + (b + c))(c - {a - b))(c + (a - b)){a + (b - c)), 
where the sides have been relabeled so that a > b > c. 


6 This formula is due to William Kahan, one of the designers of the IEEE-754 floating point Standard. 
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What happens if you rewrite the factors in this equation to remove their inner paren¬ 
theses? Why? 

P9.1.2 Write a function to determine the machine epsilon of a numerical data type 
(f loat, np. f loatl28, int, etc.). 


9.2 Stability and conditioning 

9.2.1 The stability of an algorithm 

The stability of an algorithm may be thought of in relation to how it handles approxi- 
mation errors that occur in its operation or its input data. These errors typically arise 
from experimental uncertainties (imperfect measurements providing the input data) 
or from the sort of floating point approximations involved in the calculations of the 
algorithm discussed in the previous section. Another common source of error is in the 
approximations made in “discretizing” a problem: the need to represent the values of 
a continuous function, y = /(x) say, on a discrete “grid” of points: y, = f(xj). An 
algorithm is said to be numerically stable if it does not magnify these errors and unstable 
if it causes them to grow. 


Example E9.1 Consider the differential equation, 

d y 

— = -ay 
dx 

for a > 0 subject to the boundary condition y(0) = 1. This simple problem can be 
solved analytically: 


but suppose we want to solve it numerically. The simplest approach is th eforward (or 
explicit ) Euler method: choose a step size, h, defining a grid of x values, x,- = x,-_ i + h, 
and approximate the corresponding y values through: 


}'i = Vi -1 + h 



= yi-i - hayt-i = >’/-i 0 


ah). 


The question arises: what value should be chosen for hl A small h minimizes the error 
introduced by the approximation above which basically joins y values by straight-line 
segments, 7 but if h is too small there will be cancellation errors due to the finite precision 
used in representing the numbers involved. 8 


7 That is, the Taylor series about y,-_ [ has been truncated at the linear term in h. 

s In the extreme case that h is chosen to be smaller than the machine epsilon , typically about 1 x 10“then 
we have x,- = x,_ [ and there is no grid of points at ali. 
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Figure 9.1 Instability of the forward-Euler solution to dy/dv = — ay for large step size, h. 

The following code implements the forward Euler algorithm to solve the earlier 
differential equation. The largest value of h (here, h = a/ 2=1) clearly makes the 
algorithm unstable (see Figure 9.1). 

Listing 9.1 Comparison of different step sizes, h, in the numerical solution of y' = -ay by the forward 
Euler algorithm 


import numpy as np 
import pylab 

alpha, yO, xmax = 2, 1, 10 
def euler_solve(h, n): 

""" Solve dy/dx = -alpha.y by forward Euler method for step size h. 
y = np.zeros(n) 
y [0] = yO 

for i in range(l, n): 

y [ i ] = (1 - alpha * h) * y[i-l] 

return y 

def plot_solution(h): 

x = np.arange(0, xmax, h) 
y = euler_solve(h, len(x)) 

pylab.plot(x, y, label='$h={}$'.format(h)) 

for h in (0.01, 0.2, 1) : 
plot_solution(h) 


pylab.legend() 
pylab.show() 
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Example E9.2 The integral 

I n = ( x n e x dr n = 0,1,2, • • • 

J o 

suggests a recursion relation obtained by integration by parts: 

I n = [xV*]' — n f x n ~ 1 e x dx = e — nI n -\ 

J o 

terminating with Iq = e — 1. However, this algorithm, applied “forward” for increasing 
n is numerically unstable since small errors (such as floating point rounding errors) are 
magnified at each step: if the error in I„ is e n such that the estimated value of I' n = I„+€ n 
then 

e„ = /' - I n = (e - nl ' tl _ l ) - (e - nl„- 1 ) = n(I n -\ - ) = -ne n - 1 , 

and hence \€„\ = n\e q. Even if the error in eo is small, that in e„ is larger by a factor n\, 
which can be huge. 

The numerically stable solution, in this case, is to apply the recursion backward for 
decreasing n: 

1 (/! 

In -1 = ~(e — I„) =$■ e„_i =-. 

n n 

That is, errors in I n are reduced on each step of the recursion. One can even start the 
algorithm at I' N = 0 and providing enough steps are taken between N and the desired n 
it will converge on the correct l n . 

Listing 9.2 Comparison of algorithm stability in the calculation of /(«) = /J x n e x dr 

# eg9-integral-stability.py 
import numpy as np 
import pylab 

def Iforward(n): 
if n == 0: 

return np.e - 1 

return np.e - n * Iforward(n-1) 

def Ibackward(n): 
if n >= 99: 

return 0 

return (np.e - Ibackward(n+1)) / (n+1) 


N = 35 

Iforward = [np.e - 1] 
for n in range(l, N+1): 

Iforward.append(np.e - n * Iforward[n-1]) 

Ibackward = [0] * (N+1) 
for n in range(N-1,-1,-1): 

Ibackward[n] = (np.e - Ibackward[n+1]) / (n+1) 

n = range(N+1) 

pylab.plot(n, Iforward, label='Forward algorithm') 
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pylab.plot(n, Ibackward, label='Backward algorithm') 

pylab.ylim(-0.5, 2) 

pylab.xlabel('$n $') 

pylab.ylabel('$1(n)$') 

pylab.legend() 

pylab.show() 


Figure 9.2 shows the forward algorithm becoming extremely unstable for n > 16 and 
fluctuating between very large positive and negative values; conversely, the backward 
algorithm is well behaved. 


9.2.2 Well-conditioned and ill-conditioned problems 

In numerical analysis, a further distinction is made between problems which are well- 
or ill-conditioned. A well-conditioned problem is one for which small relative errors in 
the input data lead to small relative errors in the solution; an ill-conditioned problem is 
one for which small input errors lead to large errors in the solution. Conditioning is a 
property of the problem, not the algorithm and is distinet from the issue of stability: it is 
perfectly possible to use an unstable algorithm on a well-conditioned problem and end 
up with erroneous results. 


Example E9.3 Consider the two lines given by the equations: 

y = x 

y = mx + c 

These lines intersect at (x*, y*) = (c/(l — m), c/( 1 — m)). Finding the intersection point 
is an ill-conditioned problem when m ~ 1 (lines nearly parallel). 
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For example, the lines y = x and y = (1.01)x+2intersectat (x*,y*) = (-200,-200). 
If we perturb m slightly by Sm = 0.001, to m! = m + Sm = 1.011, the intersection point 
becomes (x*,y*) = (—181.8182, —181.8182). Thatis, a relative error of 8m/m 0.001 
in m has created a relative error of |(x* — x*)/x*| ~ 0.091, almost 100 times larger. 

Conversely, if the lines have very different gradients, the problem is well-conditioned. 
Take, for example, m = —1 (perpendicular lines): the intersection (1,1) becomes 
(1.0005, 1.0005) under the same perturbation to m' = m + Sm = —0.999, leading 
to a relative error of 0.0005, which is actually smaller than the relative error in m. 


Example E9.4 The conditioning of polynomial root-linding is notoriously bad. One 
famous example is Wilkinson’s polynomial: 

20 

P(x) = J (x — i) = (x — l)(x — 2) • • • (x — 20) 

(=t 

= x 20 - 210x 19 + 20615x 18 + • ■ • + 2432902008176640000 

By inspection, the roots are simply 1,2, ,20. Flowever, Wilkinson showed that 

decreasing the coefficient of x 19 from —210 to —210 — 2~ 23 ~ —210.000000119209 
had a drastic effect on many of the roots, some of which become complex. For example, 
the root at x = 20 moves to x = 20.8, a change of 4% on a perturbation of one 
coefficient by less than one part in a billion (see also Problem 9.2.2). 


Problems 


P9.2.1 The simplest (and least accurate) way to calculate the first derivative of a 
function is to simply use the definition: 


/(x) = lim 

/r-> 0 


/(x + li) -/(x) 
h 


Fixing h at some small value, our approximation is 

( /(x + h) -/(x) 


f\x) 


h 


Using the function/(x) = e x , which value of h (to the nearest power of 10) gives the 
most accurate approximation to/'(I) = e? 

P9.2.2 Use NumPy’s Polynomial class (see Section 6.4) to generate an object repre- 
senting Wilkinson’s polynomial from its roots to the available numerical precision; then 
find the roots of this representation of the polynomial. 
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9.3 Programming techniques and Software development 

9.3.1 General remarks 

Commenting code 

Throughout this book we have tried to comment the code examples and exercise Solu¬ 
tions helpfully. This is a good practice, even for short Scripts, but the effective use of 
comments is not an entirely trivial activity. Here is some general advice: 

• Generally, prefer to place comments on their own lines rather than “inline” with 
code (that is, after but on the same line as the code they describe): 

# Volume of a dodecahedron of side length a 

V = (15 + 7 * np.sqrt(5)) / 4 * a**3 

rather than 

V = (15 + 7 * np.sqrt(5)) / 4 * a**3 # Volume of a dodecahedron of side a 

• Explain why your code does what it does, don’t simply explain what it does. 
Assume that the person reading your code knows the syntax of the language 
already. Thus, 

# Increase i by 10: 
i += 10 

is a terrible comment which adds nothing to the line of code it purports to explain. 
On the other hand, 

# Skip the next 10 data points 
i += 10 

at least gives some indication of the reason for the statement. 

• Keep comments up-to-date with the code they explain. It is ali too easy to change 
code without synchronizing the corresponding comments. This can lead to a 
situation that is worse than having no comment at ali: 

# Skip the next 10 data points 
i += 20 

Which is correct? Is the comment correct in explaining the programmer’s inten- 
tion but the line of code buggy, or has the line of code been updated for some 
reason without changing the comment? If your code is likely to be subject to 
such changes, consider defining a separate variable to hold the change in i: 

DATA SKIP =10 


# Skip the nex t DATA_SKIP data points 
i += DATA_SKIP 

In fact, some programmers advocate aiming to minimize the number of comments 
by carefully choosing meaningful identifier names. For example, if we rename 
our index, we might even do away with the comment altogether: 

data index += DATA SKIP 
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• Explain functions carefully using docstrings. In Python, all functions have 

an attribute _doc_which is set to the docstring provided in the function 

definition (see Section 2.7.1). A docstring is usually a multiline, triple-quoted 
string providing an explanation of what the function does, the arguments it takes 
and the nature of its return value(s), if any. From an interactive shell, typing 
help (function_name) provides more detailed information concerning the 
function, including this docstring. 


Example E9.5 An example of a well-commented function (to calculate the volume of 
a tetrahedron) is given here. 

Listing 9.3 A function to calculate the volume of a tetrahedron 

# eg9-tetrahedron.py 
import numpy as np 

def tetrahedron_volume(vertices=None, sides=None): 

Return the volume of the tetrahedron with given vertices or side lengths. 

If vertices are given they must be in an array with shape (4,3): the 
position vectors of the four vertices in three dimensions; if the six sides are 
given, they must be an array of length 6. If both are given, the sides 
will be used in the calculation. 

Raises a ValueError if the vertices do not form a tetrahedron (e.g., 
because they are coplanar, colinear or coincident). 


# 

# 

# 

# 

# 

# 

# 

# 


This method implements Tartaglia's formula using the Cayley-Menger 
determinant: 


10 

11 

288 V*2 = /l 

ll 

ll 

where sl, s2, 


i i i i i 
0 sl*2 S2*2 s3 *2 I 
Sl*2 0 S4*2 s5 *2 j 

S2*2 s4 *2 0 s6*21 

S3 a 2 s5*2 s6*2 0 I 

..., s6 are the tetrahedron side lengths. 


# Warning: this algorithm has not been tested for numerical stability. 

# The indexes of rows in the vertices array corresponding to all 

# possible pairs of vertices 

vertex_pair_indexes = np.array(((0, 1), (0, 2), (0, 3), 

(1, 2) , (1, 3) , (2, 3) ) ) 

if sides is None: 

# If no sides were provided, work them out from the vertices 

O vertices = np.asarray(vertices) 

if vertices.shape != (4,3): 

raise TypeError('vertices must be a numpy array with shape (4,3)') 

# Get all the squares of all side lengths from the differences between 

# the 6 different pairs of vertex positions 
vertexl, vertex2 = vertex_pair_indexes.T 

sides_squared = np.sum((vertices[vertexl] - vertices [vertex2])**2, 

axis=-l) 


Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:49, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://d 0 i. 0 rg/l 0.1017/CB09781139871754.009 




9.3 Programming techniques and Software development 


417 


else: 

# Check that sides has been provided as a valid array and square it 
sides = np.asarray(sides) 
if sides.shape != (6,): 

raise TypeError('sides must be an array with shape (6,)') 
sides_squared = sides**2 

# Set up the Cayley-Menger determinant 
M = np.zeros((5,5)) 

# Fili in the upper triangle of the matrix 
M[0,1: ] = 1 

# The squared-side length e1 ements can be indexed using the vertex 

# pair indexes (compare with the determinant illustrated above) 

M[tuple(zip(*(vertex_pair_indexes +1)))] = sides_squared 

# The matrix is symmetric, so we can fili in the lower triangle by 

# adding the transpose 
M = M + M.T 

# Calculate the determinant and check it is positive (negative or zero 

# values indicate the vertices to not form a tetrahedron). 
det = np.linalg.det(M) 

if det <= 0: 

raise ValueError('Provided vertices do not form a tetrahedron') 
return np.sqrt(det / 288) 

O Using np . asarray to convert vertices into a NumPy array if it isn’t one already 
enables the function to work with any compatible object (such as a list of lists). 


Style Guide for Python Code 

The officially recommended coding conventions for Python are provided by a docu- 
ment known as PEP8 (available at www.python.org/dev/peps/pep-0008/). While it is 
acknowledged that it isn’t always appropriate to follow these conventions all the time, 
Python programmers generally agree that they maximize the comprehensibility and 
maintainability of code. The focus is on consistency, readability and in minimizing the 
probability of hard-to-find typographical errors. Some of the highlights are 

• Us tfour spaces per indentation level (and never tabs). 9 

• In assignments, put spaces around the = sign; for example, a = io,nota=io. 

• Use a maximum of 79 characters per line, where you need to split a line of code 
over more than one line: 

- favor implicit line continuation inside parentheses over the explicit use of 
the character, \ (see Section 2.3.1); 

- in arithmetic expressions, break around binary operators so that the new 
line is after the operator; 

- as far as possible, line up code so that expressions within parentheses 
line up. 


9 


A good text editor can be configured to automatically expand tabs to a fixed number of spaces. 
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For example, the following is considered poor style: 

lengthy_calculation = margin*margin_px + (border*border_px\ 

+ padding*padding_px) 

and might be better written as 

lengthy_calculation = (margin*margin_px + (border*border_px + 

padding*padding_px)) 

• Separate top-level function and class definitions by two blank lines; within a 
class, separate them by one blank line. 

• Use UTF-8 encoding for your source code (in Python 3 this is the default encod- 
ing anyway). 

• Avoid wildcard imports (from foo import *). 

• Separate operators from their operands with single spaces unless operations with 
different priorities are being combined; for example, write x = x + 5 but r2 = 
x**2 + y**2. 

• Don’t use spaces around the = in keyword arguments; for example, in function 
calls use foo (b=4.5) not foo (b = 4.5). 

• Avoid putting more than one statement on the same line separated by semicolons; 
for example, instead of a = l; b = 2, write a, b = l, 2 (see Section 4.3.1). 

• Functions, modules and packages should have short, all-lowercase names. Use 
underscores in function and module names if necessary, but avoid them in pack- 
age names. 

• Class names should be in (upper) CamelCase, also known as CapWords; for 
example, AminoAcid, not amino_acid. 

• Deline constants m in all-capitals with underscores separating words; for exam¬ 
ple, MAX_LINE_LENGTH. 


9.3.2 Editors 

While, to some extent, the choice of text editor for writing code is a personal one, most 
programmers favor one with syntax highlighting and the possibility to deline macros to 
speed up repetitive tasks. Popular choices include: 

• Sublime Text, a commercial editor with per-user licensing and a free-evaluation 
option; 

• Vim, a widely used cross-platform keyboard-based editor with a steep leaming 
curve but powerful features. The more basic vi editor is installed on almost all 
Linux and Unix operating Systems; 

• Emacs, a popular altemative to Vim; 

• Notepad++, a free Windows-only editor; 

• SciTE, a fast, lightweight source code editor; 

• Atom, another free, open-source, cross-platform editor. 


10 Note that Python doesn’t really have constants in the same way that, for example, C does. 
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Beyond simple editors, there are fully featured integrated development environments 
(IDEs) that also provide debugging, code-execution, intelligent code-completion and 
access to operating system Services. Here are some of the options available: 

• Eclipse with the PyDev plugin, a popular free IDE; 

• PyCharm, a cross-platform IDE with commercial and free editions; 

• PythonAnywhere, an online Python environment with free and paid-for options 
(https ://www. pythonanywhere.com/) ; 

• Spyder, an open source IDE for scientilic programming in Python, which inte- 
grates NumPy, SciPy, Matplotlib and IPython. 


9.3.3 Version control 

Unless properly managed, larger Software projects (in practice, anything consisting 
of more than a single file of code) often rapidly descend into a tangle with modified 
versions, experimental code, ad hoc features and temporary files. The management of 
changes to the files comprising a Software project is called version control (or revision 
control). 

At its simplest, version control can involve simply keeping code in a number of 
parallel directories (folders), numbered chronologically as the Software evolves. This 
approach can work, but if a small change in a large amount of code leads to a new 
version it is inefficient (a lot of unchanged code is copied across to the new directory). 

If a new version is created only when the code changes a lot, then there is scope for a 
lot of tangled code to be generated between versions. 

To solve these problems, there are several version control Software packages avail¬ 
able, some of which are listed here. Most of these run as standalone applications on an 
operating system and can be invoked from the connnand line or used through a graphical 
interface. Some advantages are as follows: 

• Many developers can collaborate on one project; 

• Branching: the parallel development of two versions of the Software at the same 
time, for example, to test out new features; 

• Tagging (or labeling ): a way of referring to a snapshot of the project in a particular 
state; 

• Roll-back of a file in the project to a previous version; 

• Cloning: a means of distributing a Software project along with its history of 
changes; 

• Some version control Systems integrate with online repositories for storing and 
sharing code. The most famous of these is GitHub (https://github.com/). 

We will not describe the working of version control Systems in detail (the syntax 
varies between Systems and there are extensive tutorials, documentation and even entire 
books written about each one). Some recommended options are: 

• Git: the most widely adopted version control system, Git works on a distributed 
(or decentralized) basis, allowing developers to work on a project without sharing 

Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:49, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://d 0 i. 0 rg/l 0.1017/CB09781139871754.009 


420 


General scientific programming 


a common network or Central reference code repository. Open source projects can 

be hosted for free at GitHub. 

http://git-scm.com/ 

• Mercurial: another distributed version control system. 
http ://mercurial. selenic.com/ 

• Subversion (SVN): a centralized option with free (for open source projects) host- 
ing at SourceForge (http://sourceforge.net/). As Git has gained in popularity, SVN 
is not as widely used as it once was. 

http ://subversion. apache.org/ 


9.3.4 Unit tests 

Unit testing is a way of validating Software by focusing on individual units of source 
code. As an object-oriented programming language, for Python this usually means that 
individual classes (and sometimes even individual functions) are tested against a set of 
trial data (some of which may be deliberately incorrect or malformed). The aim is to 
catch any bugs which lead to the faulty interpretation of data. The set of unit tests also 
serve as a documented and verifiable assertion that the code does what it is supposed to. 
In some paradigms of code development, unit tests are written before the code itself. 11 

An important advantage of unit testing is that it provides a means of assuring that 
subsequent changes to the code (perhaps the addition of some functionality) does not 
break it: the upgraded code should pass the same unit tests that the original code did. 

Unit testing your own code for a small project takes discipline. The tests are, them- 
selves, computer code (and, perhaps, associated data) and need careful thought to write. 
The devising of suitable unit tests often prompts the programmer to think more deeply 
about the implementation of their code and can catch possible bugs before it is written. 

Python’s unit testing framework is based around the unittest module: a simple 
application is given in the example. 


Example E9.6 Suppose we want to write a function to convert a temperature between 
the units Fahrenheit, Celsius and Kelvin (identified by the characters ' F', ' C' and ' K' 
respectively). The six formulas involved are not difficult to code, but we might wish to 
handle gracefully a couple of conditions that could arise in the use of this function: a 
physically unrealizable temperature (< 0 K) or a unit other than ' F', ' c ' or ' K'. 

Our function will first convert to Kelvin and then to the units requested; if the 
fronr-units and the to-units are the same for some reason, we want to retum the 
original value unchanged. The function convert_temperature is defined in the file 
temperature_utils.py. 

Listing 9.4 A function for converting between different temperature units 

# temperature_utils .py 

def convert_temperature(value, from_unit, to_un.it) : 

""" Convert and retum the temperature value from from_unit to to_unit. """ 


11 In particular, so-called ‘extreme’ programming. 

Downloaded from https:/www.cambridge.org/core. New York University, on 21 Feb 2017 at 16:46:49, subject to the Cambridge Core terms of use, available at 

https:/www.cambridge.org/core/terms. https://d 0 i. 0 rg/l 0.1017/CB09781139871754.009 





9.3 Programming techniques and Software development 


421 


# Dictionary of conversion functions from different units *to* K 


{'K': 

lambda 

val: 

val, 


' C' : 

lambda 

val: 

val + 

273.15, 

' F' : 

lambda 

val: 

(val 

+ 459.67)*5/9 


} 

# Dictionary of conversion functions *from* K to different units 


fromK = {'K' 

lambda 

val: 

val, 

' C' 

lambda 

val: 

val - 273.15, 

' F' 

lambda 

val: 

val*9/5 - 459.67 


} 

# First convert the temperature from from_unit to K 

try: 

T = toK[from_unit](value) 
except KeyError: 

raise ValueError('Unrecognized temperature unit: {}'.format(from_unit)) 
if T < 0: 

raise ValueError('Invalid temperature: {} {} is less than 0 K' 

.format(value, from_unit)) 


if from_unit == to_unit: 

# No conversion needed! 
return value 

# Now convert it from K to to_unit and return its value 

try: 

return fromK[to_unit](T) 
except KeyError: 

raise ValueError('Unrecognized temperature unit: {}'.format(to_unit)) 


To use the unittest module to conduct unit tests on the convert_temperature, 
we write a new Python script defining a class, TestTemperatureConversion, 
derived from the base unittest .TestCase class. This class defines methods that 
act as tests of the convert_temperature function. These test methods should 
call one of the base class’s cissertion functions to validate that the return value of 
convert temperature is as expected. For example, 

self.assertEqual(<returned value>, <expected value>) 
retums True if the two values are exactly equal and False otherwise. Other assertion 
functions exist to check that a specific exception is raised (e.g., by invalid arguments) 
or that a returned value is True, False, None, and so on. The unit test code for our 
convert temperature function is here. 

Listing 9.5 Unit tests for the temperature conversion function 

from temperature_utils import convert_temperature 
import unittest 

class TestTemperatureConversion(unittest.TestCase): 
def test_invalid(self): 

n n n 

There's no such temperature as -280 C, so convert_temperature should 
raise a ValueError . 
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n ii ii 


O 


self.assertRaises(ValueError, convert_temperature, -280, 'C', 'F') 


def test_valid(self): 

""" A series of valid temperature conversions to test. 


test_cases = [((273.16, 'K',), (0.01, 'C')), 


( (-40, 'C') , (-40, 'F')) , 

((450, 'F'), (505.3722222222222, 'K'))] 


© 


for test_case in test_cases: 

((from_val, from_unit), (to_val, to_unit)) = test_case 
resuit = convert_temperature(from_val, from_unit, to_unit) 
self.assertAlmostEqual(to_val, resuit) 


def test_no_conversion(self): 


n n n 


Ensure that if the from-units and to-units are the same the 
temperature is returned exactly as it was passed and not converted 
to and from Kelvin, which may cause loss of precision. 


© 


T = 56.67 

resuit = convert_temperature(T, 'C' , 'C') 
self.assertEqual(resuit, T) 


def test_bad_units(self): 

""" Check that ValueError is raised if invalid units are passed. " 
self.assertRaises(ValueError, convert_temperature, 0, 'C', 'R') 
self.assertRaises(ValueError, convert_temperature, 0, 'N', 'K') 

unittest.main() 


O assertRaises verifies that a specified exception is raised by the method convert_ 
temperature. The necessary arguments to this method are passed after the method 
object itself. 

© We need assertAlmostEqual here because the floating point arithmetic is likely 
to cause a loss of precision due to rounding errors. 

© We use assertEqual here to ensure that the temperature value is retumed as it was 
passed and not converted to and from Kelvin. 

Running this script shows that our function passes its unit tests: 

$ python eg9-temperature-conversion-unittest.py 


Ran 4 tests in O.OOOs 


OK 
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9.3.5 Further Reading 

• F. Brooks, (1975, 1995). The Mythical Man-Month, Addison-Wesley. Near- 
legendary monograph on Software development explaining why “adding man- 
power to a late Software project makes it later.” 

• J. Loeliger and M. McCullough, (2012). Version Control with Git, 0’Reilly. 

• S. McConnell, (2004). Code Complete: A Practical Handbook of Software Con- 
struction, Microsoft Press. 

• A. Hunt and D. Thomas, (1999). The Pragmatic Programmer, Addison-Wesley. 
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Appendix A Solutions 


Answers to selected questions are given here. For further exercises and Solutions, see 
scipython.com . 

Q2.2.5 This question illustrates the danger of “wildcard” imports: the value of the 
variable e=2 is replaced by the definition of e in the math module. The expression d 
** e therefore raises 8 to the power of e = 2.71828 • • • instead of squaring it. 

Q2.2.7 Using Python’s operators: 

>>> a = 2 
>>> b = 6 

>>> 3 * (a**3*b — a*b**3) % 7 
3 

>>> a = 3 
>>> b = 5 

>>> 3 * (a**3*b — a*b**3) % 7 
1 


Q2.2.8 The thickness of the paper on the nth fold is 2 n t, so we require 2 "t > d =^> 

«min = \\0g 2 (d/t)\. 


# distance to moon, m 

# paper thickness, m 

# base-2 logarithm 


»> d = 384400 * l.e3 

>>> t = l.e-4 

>>> math.log(d / t # 2) 


41.805745474760016 

Flence the paper must be folded 42 times to reach to the moon ([x] denotes the ceiling 
of x: the smallest integer not less than x). 

Q2.2.10 The ~ operator does not raise a number to another power (that is the * * 
operator). It is the bitwise xor operator, and in binary 10'2 is 1010 xor 0010 = 
1000 , which is 8 in decimal. 

Q2.3.1 Slice the string s=' seehemewe' as follows (other Solutions are possible in 
some cases): 

a. s [: 3 ] 

b. S [3 : 5] 

C. S [5:7] 

d. S [7 : ] 

e. S [3 : 6] 


424 
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f. S [5 : 2 : -1] 

g. S [-2 : :-3] 

Q2.3.2 Simply slice the string backward and compare with the original: 

>>> s = 'banana' 

>>> s == s[::-1] 

False 

>>> s = 'deified' 

>>> S == s[::-1] 

True 


Q2.3.5 This is not the correct way to test if the string s is equal to either ' ham' 
or 'eggs'. The expression ('eggs' or 'ham') is a boolean one in which both 
arguments, being nonempty strings, evaluate to True. The expression short-circuits 
at the first True equivalent and this operand is returned (see Section 2.2.4): that is, 
('eggs' or 'ham' ) returns 'eggs'. Because s is, indeed, the string 'eggs' the 
equality comparison returns True. However, if the order of the operands is swapped, 
the boolean or again short-circuits at the first True-equivalent, which is now 'ham' 
and returns it. The equality comparison with s fails, and the resuit is False. 

There are two correct ways to test if s is one of two or more strings: 

>>> s = 'eggs' 

>>> s == 'ham' or s == 'eggs' 

True 

>>> s in ('ham', 'eggs') 

True 


(See Section 2.4.2 for more information about the syntax of the second statement.) 

Q2.4.2 The problem is that enumerate, by default, returns the indexes and items 
of the array passed to it with the indexes starting at 0. The array passed to it is the 
slice p [l: ] = [5, 0, 2] and so enumerate generates, in turn, the tuples (0, 5), 
( l , 0 ) and ( 2 , 2 ). However, for our derivative we need the indexes into the original 
list, P, giving (l, 5), (2, o) and (3, 2). There are two alternatives: pass the 
optional argument start=l to enumerate or add 1 to the default index: 

>>> P = [4, 5, 0, 2] 

>>> dPdx = [] 

>>> for i, c in enumerate(P[1:], start=l): 

... dPdx.append(i*c) 

>>> dPdx 
[5, 0, 6] 

>>> P = [4, 5, 0, 2] 

>>> dPdx = [] 

>>> for i, c in enumerate(P[1:]): 

... dPdx.append((i+1)*c) 

>>> dPdx 
[5, 0, 6] 
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Q2.4.3 Here is one solution: 

>>> scores = [87, 75, 75, 50, 32, 32] 

>>> ranks = [] 

>>> for score in scores: 

... ranks.append(scores.index(score) + 1) 

>>> ranks 

[1, 2, 2, 4, 5, 5] 

Q2.4.4 The following calculates n to 10 decimal places. 

>>> import math 
>>> pi = 0 

>>> for k in range(20): 

O ... pi += pow(-3, -k) / (2*k+l) 

>>> pi *= math.sqrt(12) 

>>> print('pi = pi) 
pi = 3.1415926535714034 

>>> print ('error = ', abs(pi - math.pi)) 
error = 1.8389734179891093e-ll 

O The built-in pow (x, j) is equivalent to (x)**j. 

Q2.4.5 any(x) and not all (x) is True if at least one item in x is equivalent to 
True but not all of them: 

>>> xl, x2, x3 = [False, False], [1, 2, 3, 4], [1, 2, 3, 0] 

>>> any(xl) and not all(xl) 

False 

>>> any(x2) and not all(x2) 

False 

>>> any(x3) and not all(x3) 

True 


Q2.4.6 Recall that the * operator unpacks a tuple into a positional argument list 
to a function. So if z = zip (a,b) is the (iterator) sequence: (a0,b0), (al, bl) , 
(a2, b2) , .... Unpacking this sequence in the call z ip (* z) is equivalent to calling 
zip with these tuples as arguments: 

zip((a0, bO), (al, bl), (a2, b2), ...) 

zip takes the first and second items from each tuple in turn, reproducing the original 
sequences: 

(a0, al, a2, ...), (bO, bl, b2, ...) 

Q2.4.7 Simply zip the lists of sunshine hours and month names together and reverse- 
sort the resulting list of tuples: 

>>> months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 

... 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] 

>>> sun = [44.7, 65.4, 101.7, 148.3, 170.9, 171.4, 

176.7, 186.1, 133.9, 105.4, 59.6, 45.8] 

>>> for s, m in sorted(zip(sun, months), reverse=True): 

... print('{}: {:.lf} hrs'.format(m, s)) 
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Aug: 

186.1 hrs 

Jul : 

176.7 hrs 

Jun: 

171.4 hrs 

May: 

170.9 hrs 

Apr: 

148.3 hrs 

Sep: 

133.9 hrs 

Oct: 

105.4 hrs 

Mar: 

101.7 hrs 

Feb: 

65.4 hrs 

Nov: 

59.6 hrs 

Dec : 

45.8 hrs 

Jan: 

44.7 hrs 


Q2.5.1 To normalize a list: 

>>> a = [2,4,10,6,8,4] 

>>> amin, amax = min(a), max(a) 

>>> for i, val in enumerate(a): 

... a[i] = (val-amin) / (amax-amin) 

>>> a 

[0.0, 0.25, 1.0, 0.5, 0.75, 0.25] 

Q2.5.2 The following code calculates Gauss’s constant to 14 decimal places. 

>>> import math 

>>> tol = l.e-14 

>>> an, bn = 1., math.sqrt(2) 

>>> while abs(an - bn) > tol: 

... an, bn = (an + bn) / 2, math.sqrt(an * bn) 


>>> print('G = {:.14f}'.format(l/an)) 

G = 0.83462684167407 

Q2.5.3 The following code produces the first 100 “lizzbuzz” numbers. 

nmax = 100 

for n in range(l, nmax+1): 
message = '' 
if not n % 3: 

message = 'fizz' 
if not n % 5: 

message += 'buzz' 

O print (message or n) 

O Note that if n is not divisible by either 3 or 5, message will be the empty string, 
which evaluates to False in this logical expression, so n is printed instead. 

Q2.5.4 Here’s one solution, using stoich=' C8H18 ' as an example: 

Listing A.1 The structural formula of a straight-chain alkane 


# qn2-5-c-alkane-a.py 


stoich = ' C8H18' 


fragments = stoich.split('H') 
nC = int (fragments [0] [1:]) 
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nH = int(fragments[1]) 
if nH != 2*nC + 2: 

print ( '{ } is not an alkane!'.format(stoich)) 

else: 

print ('H3C', end='') 
for i in range(nC-2): 

print ('-CH2', end ='') 
print( '-CH3') 

The output is: 

H3 C - CH2 - CH2 - CH2 - CH2 - CH2 - CH2 -CH3 

Q2.7.1 Only (b) and (f) behave as intended: 

a. In the absence of an explicit return statement, the line function returns None. 
Because None cannot be joined into a string, an error occurs: 

my_sum = '\n'.join([' 56', ' +44', line, ' 100', line]) 

TypeError: sequence item 2: expected str instance, NoneType found 

b. This code works as intended. 

c. The function line returns a string, as required, but is not called as line (): 
without the parentheses, line refers to the function object itself, which cannot 
be joined in a string, so an error occurs: 

my_sum = '\n'. join([' 56', ' +44', line, ' 100', line]) 

TypeError: sequence item 2: expected str instance, function found 

d. This code does not cause an error, but outputs a string representation of the 
function instead of the string retumed when the function is called: 

56 

+44 

<function line at 0xl03d9e9e0> 

100 

<function line at 0xl03d9e9e0> 

e. This code generates unwanted None output: 

56 

+44 


None 

100 


None 

This happens because the statement print (line () ) calls the function line, 
which prints a line of hyphens but also prints its retum value (which is None 
since it doesn’t retum anything else explicitly). 

f. This code works as intended. 
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Q2.7.2 The problem is within the add_interest function: 

def add_interest(balance, rate): 

balance += balance * rate / 100 

This creates a new f loat object, balance, local to the function, which is independent 
of the original balance object. When the function exits, the local balance is destroyed 
and the original balance never updated. One lix would be to return the updated 
balance value from the function: 

>>> balance = 100 

>>> def add_interest(balance, rate): 

balance += balance * rate / 100 

. . . return balance 


>>> for year in range(4): 

... balance = add_interest(balance, 5) 

... print('Balance after year {}: ${:.2f}'.format(year+1, balance)) 

Balance after year 1: $105.00 

Balance after year 2: $110.25 

Balance after year 3: $115.76 

Balance after year 4: $121.55 

Q2.7.3 The problem is that the function digit sum does not return the sum of 
the digits of n that it has calculated. In the absence of an explicit return statement, a 
Python function returns None, but None isn’t an acceptable object to use in a modulus 
calculation and so a TypeError is raised. 

The fix is simply to add return dsum: 

def digit_sum(n): 

nnn pind and return the sum of the digits of integer n. """ 

s_digits = list(str(n)) 
dsum = 0 

for s_digit in s_digits: 

dsum += int(s_digit) 
return dsum 


def is_harshad(n): 

return not n % digit_sum(n) 

Now, as expected: 

>>> is_harshad(21) 

True 


Q4.1.1 It is a good idea to keep the try block as small as possible to prevent 
exceptions that you do not want to catch being caught instead of the one you do. For 
example, in Example E4.5, suppose we read the file after opening it within the same 
try block: 

try: 

fi = open(filename, 'r') 

lines = fi.readlines() 
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except IOError: 


Now there are two errors that could give rise to an IOError Exception being raised: 
failure to open the file and failure to read its lines. The except clause is intended to 
handle the first case, but it will also be executed in the second case when it would 
be more appropriate to handle it differently (or leave it unhandled and stop program 
execution). 

Q4.1.2 The point of f inally in Example E4.5 is that statements in this block get 
executed before the function returns. If the line 

print(' Done with file {}'.format(filename)) 

were moved to after the try block, it would not be executed if an IOError Exception 
is raised (because the function would have retumed to its caller before this print 
statement is encountered. 

Q4.2.1 This can easily be achieved with a set. Given the string, s: 

set(s.lower()) >= set('abcdefghijklmnopqrstuvwxyz') 

is True if it is a pangram. For example, 

>>> s = 'The quick brown fox jumps over the lazy dog' 

>>> set(s.lower()) >= set('abcdefghijklmnopqrstuvwxyz') 

True 

>>> s = 'The quick brown fox jumped over the lazy dog' 

>>> set(s.lower()) >= set('abcdefghijklmnopqrstuvwxyz') 

False 


Q4.2.2 This function can be used to remove duplicates from an ordered list. 

>>> def remove_dupes(1): 

... return sorted(set(1)) 

>>> remove_dupes([1,1,2,3,4,4,4,5,7,8,8,91) 

[1, 2, 3, 4, 5, 7, 8, 9] 

Note that although sets don’t have an order, they are iterable and can be passed to the 
sorted () built-in method (which returns a list). 

Q4.2.3 From within the Python interpreter: 

>>> set('hellohellohello') 

{'h', 'o', '1', 'e'} 

>>> set(['hellohellohello']) 

{'hellohellohello'} 

>>> set(('hellohellohello')) 

{'h', 'o', '1', 'e'} 

>>> set(('hellohellohello',)) 

{'hellohellohello'} 

>>> set(('hello', 'hello', 'hello')) 

{'hello'} 

>>> set(('hello', ('hello', 'hello'))) 

{'hello', ('hello', 'hello')} 

>>> set(('hello', ['hello', 'hello'])) 
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Traceback (most recent call last): 

File "<stdin>", line 1, in <module> 

TypeError: unhashable type: 'list' 

Note the difference between initializing a set with a list of objects and attempting to 
add a list as an object in a set. 

Q4.2.4 Note that the statement 

>>> a |= {2,3,4,5} 

does not change the f rozenset but rather creates a new one from the union of the old 
one and the set { 2 , 3 ,4, 5} . (In the same way, we have seen that for int object i, 
the assignment i = i + l rebinds the label i to a new integer object with value i+i 
rather than changing the value of the immutable int object previously bound to i.) 

Q4.3.1 The list comprehension 

>>> flist = [lambda x, i=i: x**i for i in range(4)] 

creates the same list of anonymous functions as that in Example E4.10. 

Note that we need to pass each i into the lambda function explicitly or else Python’s 
closure rules will lead to every lambda function being equivalent to x**3 (3 being the 
final value of i in the loop). 

Q4.3.2 The code snippet outputs the first nmax+1 rows of Pascal’s Triangle: 

[i] 

[i, i] 

[1, 2, 1] 

[1, 3, 3, 1] 

[1, 4, 6, 4, 1] 

[1, 5, 10, 10, 5, 1] 

In the list comprehension assignment, 

x = [([0]+x)[i] + (x+[0])[i] for i in range(n+l)] 

the elements of two lists are added. The two lists are formed from the list representing 
the previous row by, in the first case, adding a o to the beginning of the list, and in the 
second case, by adding a o to the end of the list. In this way, the sum is taken over by 
neighboring pairs of numbers, with the end numbers unchanged. For example, if x is 
[ l , 3 , 3 , l ], the next row is formed by summing the elements in the lists 

[0, 1, 3, 3, 1] 

[1, 3, 3, 1, 0] 

whichyields the required [l, 4, 6, 4, l]. 

Q4.3.3 

a. Index the items of a using the elements of b: 

>>> [a[x] for x in b] 

['E', 'C', 'G', 'B 7 , 'F', 'A', 'D'] 

b. Index the items of a using the sorted elements of b. In this case, the returned list 
is just (a copy of) a: 
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>>> [a[x] for x in sorted(b)] 

['A', 'B 7 , 'C 7 , 'D 7 , 'E', 7 F' , 7 G'] 

c. Index the items of a using the elements of b indexed at the elements of b(!) 

>>> [a [b [x] ] for x in b] 

[ 7 F' , 7 G 7 , 7 D' , 'C', 'A 7 , 7 E 7 , 'B'] 

d. Associate each element of b with the corresponding element of a in a sequence 

of tuples: [(4, 'A'), (2, 'B'), (6, 'C'), . . . ], which is then sorted 

- this method is used to retum the elements of a corresponding to the ordered 
elements of b. 

>>> [x for (y,x) in sorted(zip(b,a))] 

[ 7 F 7 , 'D' ( 'B', 7 G 7 , 7 A 7 , 7 E 7 , 7 C 7 ] 


Q4.3.4 To retum a sorted list of {key, value) pairs from a dictionary: 

>>> d = { 7 five 7 : 5, 'one': 1, 7 four 7 : 4, 7 two': 2, 'three': 3} 

>>> d 

{ 7 four 7 : 4, 'one 7 : 1, 'five 7 : 5, 'two 7 : 2, 'three 7 : 3} 

>>> sorted([(k, v) for k, v in d.items()]) 

[('five 7 , 5), ('four', 4), ('one 7 , 1), ('three 7 , 3), ('two 7 , 2)] 

Note that sorting the list of {key, value) tuples requires that the keys all have data types 
that can be meaningfully ordered. This approach will not work, for example, if the keys 
are a mixture of integers and strings since (in Python 3) there is no defined order to sort 
these types into: a TypeError: unorderable types: int () < str () exception 
will be raised. 

To sort by value we could sort a list of {value, key) tuples, but to keep the returned 
list as {key, value) pairs, use 

>>> sorted([(k, v) for k, v in d.items()], key=lambda item: item[l]) 

[('one 7 , 1), ('two 7 , 2), ('three 7 , 3), ('four 7 , 4), ('five 7 , 5)] 

The key argument to sorted specifies how to interpret each item in the list for ordering: 
here we want to order by the second entry (i t em [ l ] ) in each (k, v) tuple to order by 
value. 

Q4.3.5 The following code encrypts (and decrypts) a telephone number held as a 
string using the “jump the 5” method. 

77 .join( [ 7 5987604321 7 [int(i)] if i.isdigitO else for i in '555-867-5309']) 

Q6.1.1 An np. ndarray is a NumPy class for representing multidimensional arrays 
in Python; in this book, we often refer to instances of this class simply as array objects. 
np.array is a function that constructs such objects from its arguments (usually a 
sequence). 

Q6.1.2 To create a two-dimensional array, array () must be passed a sequence of 
sequences as a single argument: this call passes three sequence arguments instead. The 
correct call is 

>>> np.array( ((1,0,0), (0,1,0), (0,0,1)) , dtype=float) 
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Q6.1.3 np. array ( [0,0,0] ) creates a one-dimensional array with three elements; 
a = np. array ([[0,0,0]]) creates a 1 x 3 two-dimensional array (i.e., a [0] is the 
one-dimensional array created in the first example). 

Q6.1.4 Changing an array’s type by setting dtype directly does not alter the data 
at the byte level, only how that data are interpreted as a number, string, and so on. 
As it happens, the byte-representations of zero are the same for integers (int64) and 
floats (f loat64), so the resuit of setting dtype is as expected. However, the 8-bytes 
representing l. o translate to the integer 4602678819172646912 . To convert the data 
type properly, use astype (), which retums a new array (with its own data): 


In 

[x] 

a = np.ones( 

(3,)) 

In 

[x] 

a 


Out 

[x] 

array([ 1., 

1 - , 

In 

[x] 

a.astype('int') 

In 

[x] 

a 


Out 

[x] 

array( [1, 1, 

1 ] > 


Q6.1.5 Indexing and slicing a NumPy array: 

a. a [1,0,3] 

b. a [ 0,2 , : ] (or just a [ 0,2 ] ) 

c. a [2 , . . . ] (or a [2 , : , : ] or a [2]) 

d. a [ : , 1, : 2] 

e. a [2, :, : l: — l ] (“in the third block, for each row take the items in the middle 
two columns”). 

f. a [:,::-1,0 ] (“for each block, traverse the rows backward and take the item in 
the first column of each”). 

g. Defining the three 2x2 index arrays for the blocks, rows and columns locating 
our elements as follows: 


ia = np.array(| 

: [ 0 , 

0] , 

[ 2 , 

2 ] ] ) 

j a = np.array(| 

: [ 0 , 

0] , 

[3, 

3] ] ) 

ka = np.array(| 

: [ 0 , 

3 ], 

[0, 

3] ] ) 


a [ia, j a, ka] returns the desired resuit. 


Q6.1.6 For example, 


In [a] 

a = np.array([0, 

-1, 4.5, 0.5, 

- 0 . 2 , 1 . 1 ]> 

In [x] 

a[abs(a)<=1] 



Out [x] 

array([ 0. , -1. 

, 0.5, -0.2]) 



Q6.1.7 In the following code: 

In [x]: a, b = -2.00231930436153, -2.0023193043615 
In [x]: np.isclose(a, b, atol=l.e-14) 

Out[x]: True 

np.iscloseO returns True because although the absolute difference between the 
two numbers is greater than 10“ 14 , it is (significantly) less than rtol * abs (b) , the 
contribution from the default relative difference. To obtain the expected behavior, set 
rtol to 0: 
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In [x]: np.isclose(-2.00231930436153, -2.0023193043615, atol=l.e-14, rtol=0) 
Out[x]: False 


Q6.1.8 The different behavior here is due to the finite precision with which real 
numbers are stored: double-precision floating point numbers are only represented to the 
equivalent of about 15 decimat places and so the two numbers being compared here are 
the same to within this precision: 

In [x]: 3.1415926535897932 - 3.141592653589793 
Out [x] : 0.0 


Q6.1.9 For example, 

In [x] : N = 5 
In [x]: Nsq = N**2 

In [x]: np.allclose(np.sort(magic_square.flatten()), 

np.linspace(1, Nsq, Nsq).astype(int)) 

Out[x]: True 


In [x] : Nsum = N * (N**2 + 1) //2 

In [x]: np.allclose(np.sum(magic_square, axis=0), Nsum) 
Out[x]: True 


In [x]: np.allclose(np.sum(magic_square, axis=l), Nsum) 
Out [x] : True 


In [x]: n.allclose(np.diag(magic_square), Nsum) 
Out[x]: True 


O In [x] 
Out [x] 


n. allclose (np. diag (np. fliplr (magic_square) ) , Nsum) 
True 


O np. fliplr flips the array in the left/right direction. An alternative way to get this 
“other” diagonal is with a. ravel () [N-l: -N+l :N-1]. 

Q6.1.10 The following statement will determine if a sequence a is increasing or not: 

np.ali (np.diff (a) > 0) 


Q6.1.11 In the first case, a single object is created of the requested dtype and 
multiplied by a scatar (regular Python int). Python “upcasts” to return the resuit in 
dtype that can hold it: 

In [x]: x = np.uint8(250) 

In [x]: type(x*2) 

Out[x]: numpy.int64 

However, a ndarray, because it has a fixed byte size, cannot be upcast in the same 
way: its own dtype takes precedence over that of the scatar multiplying it, and so the 
multiplication is carried out modulo 256. 

Compare this with the resuit of multiplying two scalars with the same dtype: 

In [x]: np.uint8(250) * np.uint8(2) 

Out[x]: 244 # (of type np.uint8) 
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(You may also see a waming: RuntimeWarning: overflow encountered in 
ubyte_scalars.) 

Q6.4.1 The Polynomial deriv method returns a Polynomial object (in this case 
with a single term, the coefficient of jt°, equal to 18). This object is not equal to the 
integer object with value 18. 

Q6.4.2 Using numpy.polynomial.Polynomial, 

In [x]: pl = Polynomial([-11,1,1]) 

In [x]: p2 = Polynomial([-7,1,1]) 

In [x]: p = pl**2 + p2**2 

In [x]: dp = p.deriv() # first derivative 

In [x] : stationary_points = dp.rootsO 

In [x]: ddp = dp.deriv() # second derivative 

In [x]: minima = stationary_points[ddp(stationary_points) > 0] 

In [x]: maxima = stationary_points[ddp(stationary_points) < 0] 

In [x]: inflections = stationary_points[np.isclose(ddp(stationary_points),0)] 

In [x]: print(np.array((minima, p(minima))).T) 

[[-3.54138127 8. ] 

[ 2.54138127 8. ]] 

In [x]: print(np.array((maxima, p(maxima))).T) 

[[ -0.5 , 179.125]] 

In [x]: print(np.array((inflections, p(inflections))).T) 

[] 

That is, the function has two minima, 

/(-3.54138127) = 8 
/(2.54138127) = 8 

one maximum, 

/(-0.5) = 179.125 
and no points of inflection / undulation. 

Q6.5.1 Without overcomplicating things, 

In [x]: pauli_matrices = np.array(( 

( (0, 1) , (1, 0) ) , 

((0, -lj), (lj, 0)), 

((1, 0), (0, -1)) 

) ) 

In [x] : 12 = np.eye(2) 

In [x]: for sigma in pauli_matrices: 

print(np.allclose(sigma.T.conj().dot(sigma), 12)) 

True 

True 

True 

Q6.5.2 The following code fits the coefficients to the required quadratic equation. 
Note that this is a linear least squares fit even though the function is nonlinear in time 
because it is linear with respect to the coefficients. 

# qn6-9-b-quadratic-fit-a.py 
import numpy as np 
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import pylab 

Polynomial = np.polynomial.Polynomial 

x = np.array([1.3, 6.0, 20.2, 43.9, 77.0, 119.6, 171.7, 233.2, 304.2, 

384.7, 474.7, 574.1, 683.0, 801.3, 929.2, 1066.4, 1213.2, 

1369.4, 1535.1, 1710.3, 1894.9]) 
dt, n = 0.1, len(x) 
tmax = dt * (n-1) 
t = np.linspace(0, tmax, n) 

A = np.vstack((np.ones(n), t, t**2)).T 
coefs, resid, _, _ = np.linalg.lstsq(A, x) 

# Initial position (cm) and speed (cm.s-1), acceleration due to gravity (m.s-2) 
xO, vO, g = coefs [0], coefs [1], coefs [2] * 2 / 100 

print( 7 x0 = {:.2f} cm, vO = { : .2 f} cm.s-1, g = {:.2 f} m.s-2'.format(xO, vO, g)) 

xfit = Polynomial(coefs)(t) 
pylab.plot(t, x, 7 ko 7 ) 
pylab.plot(t, xfit, 7 r 7 ) 
pylab.xlabel( 7 Time (sec) 7 ) 
pylab.ylabel( 7 Distance (cm) 7 ) 
pylab.show() 


The fitted function is shown in Figure A.l. 
Q6.7.1 The first case, 

In [x] : a = np.array([6,6,6,7,7,7,7,7,7] ) 

In [x]: a [np.random.randint(len(a), size=5)] 
array([7, 7, 7, 6, 7]) # (for example) 
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Samples randomly from the array a with replacement: for each item selected the prob- 
ability of a 6 is ^ and the probability of a 7 is |. 

In the second case, 

In [x]: np.random.randint(6, 8, 5) 
array([6, 6, 7, 7, 7]) # (for example) 

the numbers are drawn from [6,7] uniformly, so the probabilities of each number being 
selected is \. 

Q6.7.2 The function np . random. randint samples uniformly from the half-open 
interval, [low, high) , so to get the equivalent behavior to np. random. random_ 
integers in Example E6.16 we need: 

In [x]: a, b, n = 0.5, 3.5, 4 

In [x]: a + (b-a) * (np.random.randint(1, n+1, size=10) - 1) / (n-1) 

Out [x]: array([ 0.5, 1.5, 0.5, 3.5, 1.5, 3.5, 2.5, 0.5, 1.5, 1.5]) 


Q6.7.3 The probability of winning is one in 


75 

5 



75 ■ 74 ■ 73 ■ 72 • 71 
1-2-3-4-5 


• 15 = 258890850 


To pick fi ve random numbers from 1-75 and one from 1-15: 

In [x]: (sorted(np.random.choice(np.arange(1,76), 5, replace=False)), 
np.random.randint(15)+1) 

( [4, 21, 35, 36, 64] , 14) 


Q6.7.4 Here is a more general solution to the problem. Draw the distribution of 
misprints across the book from the binomial distribution using np. random. binomial 
and count up how many pages have more than q misprints on them. To compare with 
the Poisson distribution, for the number of misprints on a page, X, we must calculate 
Pr(X >= q) = 1 - Pr(X < q) = 1 - (Pr(X = 0) + Pr(X = 1) + ■ • ■ + Pr(X = q- 1): 

Listing A.2 Calculating the probability of q or more misprints on a given page of a book. 

# qn.6-7-d-misprints-a .py 
import numpy as np 


n, m = 500, 400 
q = 2 

ntrials = 100 

errors_per_page = np.random.binomial(m, 1/n, (ntrials, n)) 
av_ge_q = np.sum(errors_per_page>=q) / n / ntrials 

print('Probability of {} or more misprints on a given page'.format(q)) 
print('Resuit from {} trials using binomial distribution: {:.6f}' 

.format(ntrials, av_ge_q)) 

# Now calculate the same quantity using the Poisson approximat ion, 

# Pr (X>=q) = 1 - exp(-lam)[l + lam + lam*2/2! + ... + lam* (q-1} / (q-1) ! ] 
lam = m/n 

poisson = 1 
term = 1 

for k in range(l,q): 
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term *= lam/k 
poisson += term 

poisson = 1 - np.exp(-lam) * poisson 

print('Resuit from Poisson distribution: {:.6f}'.format(poisson)) 


A sample output is 

Probability of 2 or more misprints on a given page 

Resuit from 100 trials using binomial distribution: 0.190200 

Resuit from Poisson distribution: 0.191208 

Q6.8.1 The two methods for calculating the DFT can be timed using the IPython 
% time it magic function 


In 

[x] 

import numpy as np 


In 

[x] 

n = 512 


In 

[x] 

# Our input function is just random 

numbers 

In 

[x] 

f = np.random.rand(n) 


In 

[x] 

# Time the NumPy (Cooley-Tukey) DFT 

algorithm 

In 

[x] 

%timeit np.fft.fft(f) 


100000 

loops, best of 3: 13.1 us per loop 


In 

[x] 

# Now calculate the DFT by direct summation 

In 

[x] 

k = np.arange(n) 


In 

[x] 

m = k.reshape((n, 1)) 


In 

[x] 

w = np.exp(-2j * np.pi * m * k / n) 


In 

[x] 

%timeit np.dot(w, f) 


1000 loops, best of 3: 354 us per loop 


In 

[x] 

# Check the two methods produce the 

same resui 

In 

[x] 

ftfast = np.fft.fft(f) 


In 

[X] 

ftslow = np.dot(w, f) 


In 

[x] 

np.allclose (ftfast, ftslow) 


Out [x] 

True 



The Cooley-Tukey algorithm is found to be almost 30 times faster than the direct 
method. In fact, this algorithm can be shown to scale as 0(n log n) compared with 0(n 2 ) 
for direct summation. 

Q8.1.1 Simply change the line: 

for rec in constants[-10:]: 

to: 

for rec in constants[constants['rel_unc'] > 0] [:10] : 

The most accurately known constant is the electron g-factor. 

2.64693e-07 ppm: electron g factor = -2.00232 

2.69687e-07 ppm: electron mag. mom. to Bohr magneton ratio = -1.00116 
3.7956e-06 ppm: electron magn. moment to Bohr magneton ratio = -1.00116 
4.96096e-06 ppm: atomic unit of time = 2.41888e-17 s 

Q8.1.2 The calculation N/V = p/k^T for the stated conditions can be done entirely 
with constants from scipy. constants: 
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In [x]: scipy.constants.atm / scipy.constants.k / scipy.constants.zero_Celsius 
Out [x] : 2.686780501003883e+25 

This is the Loschmidt constant which is defined by the 2010 CODATA standards and 
included in scipy. constants (see the documentation for details): 

In [x]: from scipy import constants 

In [x]: constants.value('Loschmidt constant (273.15 K, 101.325 kPa)') 

Out[x] : 2.6867805e+25 

Q8.2.1 By numerical integration, the resuit is seen to be 3: 

In [x]: from scipy.integrate import quad 
In [x]: import numpy as np 

In [x]: fune = lambda x: np.floor(x) - 2*np.floor(x/2) 

In [x]: quad(fune,0,6) 

Out[x]: (2.999964948683555,0.0009520766614606472) 

Q8.2.2 In the following we assume the following imports: 

In [x]: import numpy as np 

In [x]: from scipy.integrate import quad 


a. 

In [x] : 

fl = lambda x: x**4 * (1 - x) **4/(1 + x**2) 


In [x] : 

quad(f1, 0, 1) 


Out [x] : 

(0.0012644 892673496185, 1.1126990906558069e 


In [x] : 

22/7 - np.pi 


Out [x] : 

0.0012644 8 926734 96777 

b. 

In [x] : 

f2 = lambda x: x**3/(np.exp(x) - 1) 


In [x] : 

quad(f2, 0, np.inf) 


Out [x] : 

(6.493 93 940226683, 2.62 8470028924825e-09) 


In [x] : 

np.pi**4 / 15 


Out [x] : 

6.493939402266828 

c. 

In [x] : 

f3 = lambda x: x**-x 


In [x] : 

quad(f3, 0, 1) 


Out [x] : 

(1.2912859970626633, 3.668398917966442e-ll) 


In [x] : 

np.sum(n**-n for n in range(l,20)) 


Out [x] ; 

1.2912859970626636 

d. 

In [x] : 

from scipy.misc import factorial 


In [x] : 

f4 = lambda x, p: np.log(1/x)**p 


In [x] : 

for p in range(10) : 


print(quad(f4, 0, 1, args=(p,))[0], factorial(p)) 

1.0 1.0 

0.9999999999999999 1.0 
1.9999999999999991 2.0 
6.000000000000064 6.0 
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24.000000000000014 24.0 
119.9999999999327 120.0 
719.9999999989705 720.0 
5039.99999945767 5040.0 
40320.00000363255 40320.0 
362880.00027390465 362880.0 

C. In [x]: from scipy.special import iO 
In [x]: z = np.linspace(0,2,100) 

In [x] : yl = iO (z) 

In [x]: f5 = lambda theta, z: np.exp(z*np.cos(theta)) 

In [x]: y2 = np.array([quad(f5, 0, 2*np.pi, args=(zz,))[0] for zz in z]) 

In [x]: y2 /= 2 * np.pi 
In [x] : np .max (abs (y2-yl) ) 

Out[x] : 3.4796610037801656e-12 

Q8.2.3 To estimate tt by integration of the constant function f(x,y) — 4 over the 
quarter circle with unit radius in the quadrant x > 0, y > 0: 

In [x]: from scipy.integrate import dblquad 

In [x]: dblquad(lambda y, x: 4, 0, 1, lambda x: 0, lambda x: np.sqrt(l-x**2)) 

Out[x]: (3.1415926535897922, 3.533564552071766e-10) 

Q8.2.4 The integral to be calculated is 



nln 

J 0 



Note that the inner integral is over 0 and the outer is over r. Therefore, the call to 
dblquad should call the function/(r, 6) = r as lambda theta, r : r (note the order 
of the arguments). 

In [x]: dblquad( lambda theta, r: r, 0, 1, lambda r: 0, lambda r: 2*np.pi) 

Out[x]: (3.141592653589793, 3.487868498008632e-14) 

Alternatively, swap the order of the integration: 

dblquad( lambda r, theta: r, 0, 2*np.pi, lambda theta: 0, lambda theta: 1) 

(3.141592653 589793, 3.487868498008632e-14) 


Q8.4.1 Rewrite the equation as 

/(x) = x + 1 + (x — 3) -3 = 0. 

This function is readily plotted and the roots may be bracketed in (—2, —0.5) and 
(0,2.99) (avoiding the singularity at x = 3). 


In [x]: f = lambda x: x + 1 + (x-3)**-3 

In [x]: brentq(f, -2, -0.5), brentq(f, 0, 2.99) 

Out[x]: (-0.984188231211512, 2.3303684533047426) 


Q8.4.2 Some examples of root-finding for which the Newton-Raphson algorithm 
fails and how to solve this. 
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<1. In [x] : newton (lambda x: x**3 - 5*x, 1, lambda x: 3*x**2 - 5) 

RuntimeError: Failed to converge after 50 iterations, value is 1.0 

The Newton-Raphson algorithm enters an endless cycle of values for x: 

x 0 = 1 : x\ = x 0 -f(xo)/f'(xo) = -1 
xi = -1 : x 2 = xi -/(xi)//'(xi) = 1 
x 2 = 1 : x 3 = x 2 -/(x 2 )//'(x 2 ) = -1 

Alternative starting points converge correctly on a root. Even a very small dis- 
placement from x = 0 ensures convergence: 


In [x] 

newton( lambda x: x**3 - 

5*x, 

1.0001, lambda 

x: 3*x**2 - 5 

Out [x] 

2.23606797749979 




In [x] 

newton( lambda x: x**3 - 

5*x, 

1.1, lambda x: 

3*x**2 - 5) 

Out [x] 

-2.23606797749979 




In [x] 

newton( lambda x: x**3 - 

5*x, 

0.5, lambda x: 

3*x**2 - 5) 

Out [x] 

0.0 




In [x] 

f, fp = lambda x: x**3 

3*x+l, lambda x: 3*x**2 - 3 

In [x] 

newton(f, 1, fp) 




Out [x] 

1.0 




In [x] 

f(1.0) 




Out [x] 

-1 





The algorithm converged, but not on a root! Unfortunately, the gradient of the 
function is zero at the chosen starting point and because of round-off error this 
has not led to a ZeroDivisionError. To lind the roots, choose different starting 
points such that/'(xo) ^ 0, or use a different method after bracketing the roots 
by inspection of a plot of the function: 

In [x]: brentq(f, -0.5, 0.5), brentq(f, -2, -1.5), brentq(f, 1, 2) 

Out[x]: (0.34729635533386066, -1.879385241571423, 1.532088886237956) 

c. The function /(x) = 2 — x 5 has a flat plateau around /(0) = 2 and the small 
gradient here leads to slow convergence on the root: 

In [x]: newton(f, 0.01, fp) 

RuntimeError: Failed to converge after 50 iterations, value is ... 

To find it using newton either move the starting point closer to the root, or 
increase the maximum number of iterations: 

In [x]: newton(f, 0.01, fp, maxiter=100) 

Out[x]: 1.148698354997035 

d. This is another example of a function that generates an endless cycle of values 
from the Newton-Raphson method: 

In [x]: f = lambda x: x**4 - 4.29 * x**2 - 5.29 
In [x]: fp = lambda x: 4*x**3 - 8.58 * x 
In [x]: newton(f, 0.8, fp) 

RuntimeError: Failed to converge after 50 iterations, value is ... 
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Unlike the function in (a), the region 0.6 < xq < 1.1 attracts this cyclic behavior, 
so one needs to initialize the algorithm outside this range to obtain the roots ±2.3. 
For example, 

In [x]: newton(f, 1.2, fp) 

Out[x]: -2.3 


Q8.4.3 In general, there are two (physically distinet) possible angles 0 q correspond- 
ing to the projectile passing through the specilied point, (x [,y i) = (5,15), on the way 
up or on the way down. These values are the roots in (0, jr/2) of the function 

2 

gXJ 

f(0 o ;xi,zi) =x\ tan0 o - —*-y— - z\ 

2 \>q cos z oq 

After bracketing the roots with a rough plot of/(0o), we can use brentq: 

In [x] : g = 9.81 

In [x]: vO, xl, zl = 25, 5, 15 

In [x]: f = lambda thetaO, xl, zl: xl * np.tan(thetaO) - g / 2\ 

* (xl / vO / np.cos(thetaO))**2 - zl 
In [x]: thl = brentq(f, 1, 1.4, args=(xl,zl)) 

In [x]: th2 = brentq(f, 1.5, 1.6, args=(xl,zl)) 

In [x]: np.degrees(thl), np.degrees(th2) 

Out[x]: (74.172740936822834, 87.392310240255171) 

That is, 9 0 = 74.2° or 0 0 = 87.4°. 

Q9.1.1 Let x = 0.9999 ■ • •. Then, 


lOx = 9.9999 • • • = 9 ± x =>• 9x = 9 =>• x = 1. 


Q9.1.2 This occurs because math.pi is only a (double-precision floating point) 
approximation to n, and the tangent of this approximate value happens to be negative: 

In [x]: math.tan(math.pi) 

Out[x]: -1.2246467991473532e-16 

Taking the square root leads to the math domain error. 

Q9.1.3 The problem, of course, is that the expression has been written using double- 
precision floating point numbers and the difference between the sum of the first two 
terms and the third is smaller than the precision of this representation. Using the exact 
representation in integer arithmetic, 

In [x]: 844487**5 + 1288439**5 

Out[x]: 3980245235185639013055619497406 

In [x]: 1288439**5 

Out[x]: 3980245235185639013290924656032 

giving a difference of 

In [x]: 844487**5 + 1288439**5 - 1318202**5 
Out[x]: -235305158626 

The finite precision of the floating point representation used, however, truncates the 
decimal places before this difference is apparent: 
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+9.999X1CT 1 



FigureA.2 A comparison of the numerical behavior of/(x) = (1 — cos 2 x) /x 2 and 
g(x) — sin 2 x/x 2 close to x — 0. 


In [x]: 844487.**5 + 1288439.**5 
Out [x]: 3.980245235185639e+30 
In [x]: 1318202.**5 
Out [x]: 3.980245235185639e+30 

This is an example of catastrophic cancellation. 

Q9.1.4 The expression l - np.cos(x)**2 suffers from catastrophic cancellation 
close to x=o resulting in a dramatic loss of precision and wild oscillations in the plot 
of/(x) (Figure A.2). Consider, for example, x = l.e-9: in this case, the dijference 
l - np. cos (x) **2 is indistinguishable from zero (at double precision) so f (x) 
retums 0. Conversely, np. sin (x) **2 is indistinguishable from x**2 and g (x) returns 
l. o correctly. 

Listing A.3 A comparison ofthe numerical behavior of/ (x) = (1 - cos 2 x)/x 2 and g(x) = sin 2 x/x 2 
close to x = 0. 

# qn9-l-c-cos-sin-a.py 


import numpy as np 
import pylab 

f = lambda x: (1 - np.cos(x)**2)/x**2 
g = lambda x: (np.sin(x)/x)**2 

x = np.linspace(- 0.0001, 0.0001, 10000) 

pylab.plot(x, f(x)) 
pylab.plot(x, g(x)) 
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pylab.ylim(0.99995, 1.00005) 
pylab.show() 


Q9.1.5 We cannot compare with == because nan is not equal to itself. However, it 
is the only fioating point number that is not equal to itself, so use ! = instead: 

In [x] : c = 0 * l.elOOO # 0 * inf is nan 
In [x] : c ! = c 

Out[x]: True # c isn't equal to itself, so must be nan 
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